[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-21 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912326#comment-16912326
 ] 

Wes McKinney commented on ARROW-6206:
-

> As I find the time for signups and 2fa’s I will compose this to the lists

This shouldn't be too complicated, all you have to do is send an e-mail to 
dev-subscr...@arrow.apache.org

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-21 Thread Jim Northrup (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911955#comment-16911955
 ] 

Jim Northrup commented on ARROW-6206:
-

(previsouly responded as email, sorry if this creates a dupe)

 

I admire Arrow for doing a thing well.  I hope that if I simply call “mvn 
maven-versions-plugin:latest” in the future this simple jdbc code will work 
better than before.

 

I appreciate the attention to the details.

 

I think through this discussion the jist is that tensorflow one-hot columns may 
quickly test the expected norms of arrow.  Likewise, timeseries datasets have 
us blowing gaskets all over the place in terms of time-to-completion and RAM 
using pandas.  What do we do with a 300 gig numpy dataset living in swap that 
takes 3 dasy to build? There’s no LSTM examples to demonstrate anything but toy 
datasets.

 

Turbodbc looks like a good fit for reducing transcription times.

 

For what I need in the space of Arrow, I think the ideal tool is something to 
work in and out of numpy and delegate to and from apache Geode or Hazelcast as 
the main substrate. 

 

If perchance arrow can act as a window to memory grids, all the better.

 

As I find the time for signups and 2fa’s I will compose this to the lists

 

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-20 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911922#comment-16911922
 ] 

Micah Kornfield commented on ARROW-6206:


"iiuc arrow is a team that picked up netty derived off-heap tools naively and 
demonstrated that in 2019 it's still prone to some gotchas that are a little 
bit stronger than edge cases when the unit tests pass."

It is true the Java Arrow library has a steep learning curve, and could use 
better documentation so new developers aren't bitten.  There has also been less 
focus on the non-core Java libraries (i.e. adapters) until recently, and we 
need to do something distinguish the maturity between them so these types of 
things are less surprising.  If you have suggestions please let us know.  I 
would suggest perhaps sending mail to the dev@ or user@ mailing lists, since 
generally more people monitor those then conversations on JIRA.  FWIW, the core 
library was adapted from Apache Drill and used by Dremio in their product, both 
of which, iiuc are long running processes that provide competitive analytic 
performance (I don't know how prone to resource leakage they are are).

 

"and gave me the confidence to assume this will do the job faster than python. 
and so began this thread on 800+ megabytes of data."

I'm sorry you ran into this.  If think you are working into the python 
ecosystem Turbodbc might be your best bet of getting data into Arrow.  In 
general, most of the python code is just a facade on top of C++ so I would 
expect it to be pretty performant.  Please discuss on the mailing list or 
continue to file JIRAs if you are seeing unexpected performance/behavior.  We 
want to know.

 

"you should really put a consumer facing notice on where NIO is and is not 
present."

Would you mind opening up a JIRA/Pull Request describing how you think it is 
best to publicize it?

 

 

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-20 Thread Jim Northrup (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911465#comment-16911465
 ] 

Jim Northrup commented on ARROW-6206:
-

> Could you provide a link to the text you quoted I'd be interested in reading 
> it.
 
this is the benefit of having written what amounts to a netty analog over the 
course of 4 years, including an SSL/TLS sockets layer for http at one point.  
ultimately there is danger in long-lived services using NIO, end-of-story.

the process cleanup of the underlying OS will be the best protection against 
java NIO/JNI memory handles -- if you have a daemon or long-running thing, or 
you must use directbuffers, assume that the reference counting is imperfect, 
and it will bite you one day (it may take days) if you trust it.  so thing that 
use nio should be short lived, and wherever possible process encapsulated.  

netty is the jboss-endorsed c10k java representative with the popular 
marketshare. iiuc arrow is a team that picked up netty derived off-heap tools 
naively and demonstrated that in 2019 it's still prone to some gotchas that are 
a little bit stronger than edge cases when the unit tests pass.  indeed, my 
initial testing with writing jdbc to arrow on kilobytes of records succeeded 
well, and gave me the confidence to assume this will do the job faster than 
python.  and so began this thread on 800+ megabytes of data.

considering the age and size of the netty ecosystem, there is no lack of 
scrutiny or open source virtue here.  it's a VM-level weakness that java NIO is 
still something like peanuts in the kitchen, you should really put a consumer 
facing notice on where NIO is and is not present.





> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-13 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906350#comment-16906350
 ] 

Micah Kornfield commented on ARROW-6206:


[~tianchen92] thanks for volunteering to do this.

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Assignee: Ji Liu
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-13 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906345#comment-16906345
 ] 

Micah Kornfield commented on ARROW-6206:


"is there a charter for what java usecases will be supported, and THEN, what 
among these items will leverage NIO, and what among these can use pure heap 
implementations of objects exclusively?"

I don't fully understand this question.  My best attempt to answer it below:

The system property is needed because we use Netty as an off-heap memory 
allocator, this could potentially be replaced with something JNI based.

The core of the current Java implementation is off-heap memory.  If you have 
specific requirements/use-cases in mind discussing dev@ or user@ mailing list 
is probably the way to go.

 

Could you provide a link to the text you quoted I'd be interested in reading it.

 

 

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-13 Thread Jim Northrup (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906033#comment-16906033
 ] 

Jim Northrup commented on ARROW-6206:
-

NIO is not going to go away, and java is not going to stop harboring 
unreproducable NIO bugs.

is there a charter for what java usecases will be supported, and THEN, what 
among these items will leverage NIO, and what among these can use pure heap 
implementations of objects exclusively?

the utilities should abandon all hope of stability or useful benchmarks while 
there is a NIO component in a  piece of code.  the oracle engineers this year 
are certainly not on the same page as the jdk8 team, or the jdk6 team.  

Unsafe/NIO usecases number about 2:

if you're utilizing mmap files to minimize page faults, go there.
if you're talking to crossplatform structs and mailboxes, you have no choice.
if you're squirreling away heap objects using something greater than the -Xmx 
setting, you should probably engineer it through mmap file access instead of 
using native handles directly, this is extremely unstable in my experience.
 


> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904834#comment-16904834
 ] 

Ji Liu commented on ARROW-6206:
---

Fine, no problem.

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904832#comment-16904832
 ] 

Micah Kornfield commented on ARROW-6206:


I would say we don't need to include "stricter" things in the README, since
they aren't really exceptions.




> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904831#comment-16904831
 ] 

Ji Liu commented on ARROW-6206:
---

Seems you are right, the reference was in test execution.

If you would like to provide a PR for this, I think ''Java Code Style Guide' 
could also be updated (unused imports, redundant modifier) or I can take this 
issue:) (If so please let me know if there's other info should be updated 
besides the above ones).

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904828#comment-16904828
 ] 

Micah Kornfield commented on ARROW-6206:


It could probably be set, for all versions, but I think it is only required 
past JVM 8 (could be mistaken though).

 

"true
 is already in pom.xml, it dosen't work?"

The only reference I found was for test execution in POM.xml. I think Consumers 
of the library have to set this property themselves when running the JVM.  But 
my maven knowledge is weak, so I might be misunderstanding something.

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904819#comment-16904819
 ] 

Ji Liu commented on ARROW-6206:
---

Yes, I think Wes has created a issue for restructured text 
https://issues.apache.org/jira/browse/ARROW-5542.

One more question, why should set jvm param for JVM>=9?

Not quite familiar, seems 

true 
is already in pom.xml, it dosen't work?

 

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904816#comment-16904816
 ] 

Micah Kornfield commented on ARROW-6206:


arrow/java/README.md is what I was thinking. At some point we might want to 
create more formal docs using restructured text at: 
[https://github.com/apache/arrow/tree/master/docs/source]

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties

2019-08-11 Thread Ji Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904808#comment-16904808
 ] 

Ji Liu commented on ARROW-6206:
---

Curious where to update docs? is /arrow/java/README.md?

I just noticed that something in this file should also be updated like 'Java 
Code Style Guide'

> [Java][Docs] Document environment variables/java properties
> ---
>
> Key: ARROW-6206
> URL: https://issues.apache.org/jira/browse/ARROW-6206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, Java
>Reporter: Micah Kornfield
>Priority: Major
>
> Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and 
> BoundsChecking/NullChecking for get.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)