[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns even if row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more than 10 columns even if row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more than 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more than 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
options do not allow limit row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two options do not allow limit row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more then 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated SPARK-44037:
--
Description: 
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100

I suggest to add additional option maxCharsPerRow

  was:
CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100


> Add maxCharsPerRow option for CSV datasource
> 
>
> Key: SPARK-44037
> URL: https://issues.apache.org/jira/browse/SPARK-44037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dmitry Sysolyatin
>Priority: Major
>
> CSV datasource supports maxColumns and maxCharsPerColumn options. But those 
> two option does not allow restrict row size properly.
> For instance, if I want to limit the row size to be less than or equal to 
> 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then
> # User can not read column with size > 10 even if row size <= 100
> # User can not read more then 10 columns where each column < 5 chars even if 
> row size <= 100
> I suggest to add additional option maxCharsPerRow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44037) Add maxCharsPerRow option for CSV datasource

2023-06-13 Thread Dmitry Sysolyatin (Jira)
Dmitry Sysolyatin created SPARK-44037:
-

 Summary: Add maxCharsPerRow option for CSV datasource
 Key: SPARK-44037
 URL: https://issues.apache.org/jira/browse/SPARK-44037
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dmitry Sysolyatin


CSV datasource supports maxColumns and maxCharsPerColumn options. But those two 
option does not allow restrict row size properly.

For instance, if I want to limit the row size to be less than or equal to 100, 
and I set maxColumns to 10 and maxCharsPerColumn to 10, then
# User can not read column with size > 10 even if row size <= 100
# User can not read more then 10 columns where each column < 5 chars even if 
row size <= 100



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36561) Remove `ColumnVector.numNulls`

2021-08-23 Thread Dmitry Sysolyatin (Jira)
Dmitry Sysolyatin created SPARK-36561:
-

 Summary: Remove `ColumnVector.numNulls`
 Key: SPARK-36561
 URL: https://issues.apache.org/jira/browse/SPARK-36561
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Dmitry Sysolyatin


Hi!
When I was implementing ColumnVector abstract class I started to check where 
`numNulls` is used in spark source code. I didn't find any places where it is 
used except tests

Is there any plans to use it in the future. If no then I suppose to remove 
`numNulls` method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org