[jira] [Commented] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")

2020-10-10 Thread Josh Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211816#comment-17211816
 ] 

Josh Taylor commented on ARROW-10242:
-

I couldn't get this to fail again, I rebuilt everything and the basic querying 
seems to work now.

 

Thanks!

> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel")
> --
>
> Key: ARROW-10242
> URL: https://issues.apache.org/jira/browse/ARROW-10242
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 2.0.0
>Reporter: Josh Taylor
>Assignee: Andy Grove
>Priority: Major
>
> *Running the latest code from github for datafusion & parquet.*
> When trying to read a directory of around ~210 parquet files (3.2gb total, 
> each file around 13-18mb), doing the following:
> {code:java}
> let mut ctx = ExecutionContext::new();
> // register parquet file with the execution context
> ctx.register_parquet(
>  "something",
>  "/home/josh/dev/pat/fff/"
> )?;
> // execute the query
> let df = ctx.sql(
>  "select * from something",
> )?;
> let results = df.collect().await?;
>  
> {code}
> I get the following error shown ~204 times:
> {code:java}
> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel"){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")

2020-10-08 Thread Josh Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210605#comment-17210605
 ] 

Josh Taylor commented on ARROW-10242:
-

Hi [~andygrove],

I'm not sure if i'm using a nested type, they should all be pretty primitive 
types. I'll start by removing all the fields and field types and adding one at 
a time and see what causes it to explode.

Thanks for the swift response!

> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel")
> --
>
> Key: ARROW-10242
> URL: https://issues.apache.org/jira/browse/ARROW-10242
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 2.0.0
>Reporter: Josh Taylor
>Assignee: Andy Grove
>Priority: Major
>
> *Running the latest code from github for datafusion & parquet.*
> When trying to read a directory of around ~210 parquet files (3.2gb total, 
> each file around 13-18mb), doing the following:
> {code:java}
> let mut ctx = ExecutionContext::new();
> // register parquet file with the execution context
> ctx.register_parquet(
>  "something",
>  "/home/josh/dev/pat/fff/"
> )?;
> // execute the query
> let df = ctx.sql(
>  "select * from something",
> )?;
> let results = df.collect().await?;
>  
> {code}
> I get the following error shown ~204 times:
> {code:java}
> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel"){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10242) Parquet reader thread terminated due to error: ExecutionError("sending on a disconnected channel")

2020-10-08 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210600#comment-17210600
 ] 

Andy Grove commented on ARROW-10242:


Hi [~joshx]  and thanks for the bug report. I was unable to reproduce the issue 
on any of the parquet data sets that I usually test with, but they are simple 
data sets containing primitive types. My first guess here is that there is 
something in the files that DataFusion doesn't support and the error message is 
being suppressed, but this is just a guess. Do your files contain nested types?

 

Do you see any other errors before the disconnected channel error?

> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel")
> --
>
> Key: ARROW-10242
> URL: https://issues.apache.org/jira/browse/ARROW-10242
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 2.0.0
>Reporter: Josh Taylor
>Assignee: Andy Grove
>Priority: Major
>
> *Running the latest code from github for datafusion & parquet.*
> When trying to read a directory of around ~210 parquet files (3.2gb total, 
> each file around 13-18mb), doing the following:
> {code:java}
> let mut ctx = ExecutionContext::new();
> // register parquet file with the execution context
> ctx.register_parquet(
>  "something",
>  "/home/josh/dev/pat/fff/"
> )?;
> // execute the query
> let df = ctx.sql(
>  "select * from something",
> )?;
> let results = df.collect().await?;
>  
> {code}
> I get the following error shown ~204 times:
> {code:java}
> Parquet reader thread terminated due to error: ExecutionError("sending on a 
> disconnected channel"){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)