[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more than 10 columns even if row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more than 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more than 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two options do not allow limit row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two options do not allow limit row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more then 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-44037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sysolyatin updated SPARK-44037: -- Description: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 I suggest to add additional option maxCharsPerRow was: CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 > Add maxCharsPerRow option for CSV datasource > > > Key: SPARK-44037 > URL: https://issues.apache.org/jira/browse/SPARK-44037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dmitry Sysolyatin >Priority: Major > > CSV datasource supports maxColumns and maxCharsPerColumn options. But those > two option does not allow restrict row size properly. > For instance, if I want to limit the row size to be less than or equal to > 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then > # User can not read column with size > 10 even if row size <= 100 > # User can not read more then 10 columns where each column < 5 chars even if > row size <= 100 > I suggest to add additional option maxCharsPerRow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44037) Add maxCharsPerRow option for CSV datasource
Dmitry Sysolyatin created SPARK-44037: - Summary: Add maxCharsPerRow option for CSV datasource Key: SPARK-44037 URL: https://issues.apache.org/jira/browse/SPARK-44037 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dmitry Sysolyatin CSV datasource supports maxColumns and maxCharsPerColumn options. But those two option does not allow restrict row size properly. For instance, if I want to limit the row size to be less than or equal to 100, and I set maxColumns to 10 and maxCharsPerColumn to 10, then # User can not read column with size > 10 even if row size <= 100 # User can not read more then 10 columns where each column < 5 chars even if row size <= 100 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36561) Remove `ColumnVector.numNulls`
Dmitry Sysolyatin created SPARK-36561: - Summary: Remove `ColumnVector.numNulls` Key: SPARK-36561 URL: https://issues.apache.org/jira/browse/SPARK-36561 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: Dmitry Sysolyatin Hi! When I was implementing ColumnVector abstract class I started to check where `numNulls` is used in spark source code. I didn't find any places where it is used except tests Is there any plans to use it in the future. If no then I suppose to remove `numNulls` method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org