[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912326#comment-16912326 ] Wes McKinney commented on ARROW-6206: - > As I find the time for signups and 2fa’s I will compose this to the lists This shouldn't be too complicated, all you have to do is send an e-mail to dev-subscr...@arrow.apache.org > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911955#comment-16911955 ] Jim Northrup commented on ARROW-6206: - (previsouly responded as email, sorry if this creates a dupe) I admire Arrow for doing a thing well. I hope that if I simply call “mvn maven-versions-plugin:latest” in the future this simple jdbc code will work better than before. I appreciate the attention to the details. I think through this discussion the jist is that tensorflow one-hot columns may quickly test the expected norms of arrow. Likewise, timeseries datasets have us blowing gaskets all over the place in terms of time-to-completion and RAM using pandas. What do we do with a 300 gig numpy dataset living in swap that takes 3 dasy to build? There’s no LSTM examples to demonstrate anything but toy datasets. Turbodbc looks like a good fit for reducing transcription times. For what I need in the space of Arrow, I think the ideal tool is something to work in and out of numpy and delegate to and from apache Geode or Hazelcast as the main substrate. If perchance arrow can act as a window to memory grids, all the better. As I find the time for signups and 2fa’s I will compose this to the lists > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911922#comment-16911922 ] Micah Kornfield commented on ARROW-6206: "iiuc arrow is a team that picked up netty derived off-heap tools naively and demonstrated that in 2019 it's still prone to some gotchas that are a little bit stronger than edge cases when the unit tests pass." It is true the Java Arrow library has a steep learning curve, and could use better documentation so new developers aren't bitten. There has also been less focus on the non-core Java libraries (i.e. adapters) until recently, and we need to do something distinguish the maturity between them so these types of things are less surprising. If you have suggestions please let us know. I would suggest perhaps sending mail to the dev@ or user@ mailing lists, since generally more people monitor those then conversations on JIRA. FWIW, the core library was adapted from Apache Drill and used by Dremio in their product, both of which, iiuc are long running processes that provide competitive analytic performance (I don't know how prone to resource leakage they are are). "and gave me the confidence to assume this will do the job faster than python. and so began this thread on 800+ megabytes of data." I'm sorry you ran into this. If think you are working into the python ecosystem Turbodbc might be your best bet of getting data into Arrow. In general, most of the python code is just a facade on top of C++ so I would expect it to be pretty performant. Please discuss on the mailing list or continue to file JIRAs if you are seeing unexpected performance/behavior. We want to know. "you should really put a consumer facing notice on where NIO is and is not present." Would you mind opening up a JIRA/Pull Request describing how you think it is best to publicize it? > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911465#comment-16911465 ] Jim Northrup commented on ARROW-6206: - > Could you provide a link to the text you quoted I'd be interested in reading > it. this is the benefit of having written what amounts to a netty analog over the course of 4 years, including an SSL/TLS sockets layer for http at one point. ultimately there is danger in long-lived services using NIO, end-of-story. the process cleanup of the underlying OS will be the best protection against java NIO/JNI memory handles -- if you have a daemon or long-running thing, or you must use directbuffers, assume that the reference counting is imperfect, and it will bite you one day (it may take days) if you trust it. so thing that use nio should be short lived, and wherever possible process encapsulated. netty is the jboss-endorsed c10k java representative with the popular marketshare. iiuc arrow is a team that picked up netty derived off-heap tools naively and demonstrated that in 2019 it's still prone to some gotchas that are a little bit stronger than edge cases when the unit tests pass. indeed, my initial testing with writing jdbc to arrow on kilobytes of records succeeded well, and gave me the confidence to assume this will do the job faster than python. and so began this thread on 800+ megabytes of data. considering the age and size of the netty ecosystem, there is no lack of scrutiny or open source virtue here. it's a VM-level weakness that java NIO is still something like peanuts in the kitchen, you should really put a consumer facing notice on where NIO is and is not present. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906350#comment-16906350 ] Micah Kornfield commented on ARROW-6206: [~tianchen92] thanks for volunteering to do this. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Assignee: Ji Liu >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906345#comment-16906345 ] Micah Kornfield commented on ARROW-6206: "is there a charter for what java usecases will be supported, and THEN, what among these items will leverage NIO, and what among these can use pure heap implementations of objects exclusively?" I don't fully understand this question. My best attempt to answer it below: The system property is needed because we use Netty as an off-heap memory allocator, this could potentially be replaced with something JNI based. The core of the current Java implementation is off-heap memory. If you have specific requirements/use-cases in mind discussing dev@ or user@ mailing list is probably the way to go. Could you provide a link to the text you quoted I'd be interested in reading it. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906033#comment-16906033 ] Jim Northrup commented on ARROW-6206: - NIO is not going to go away, and java is not going to stop harboring unreproducable NIO bugs. is there a charter for what java usecases will be supported, and THEN, what among these items will leverage NIO, and what among these can use pure heap implementations of objects exclusively? the utilities should abandon all hope of stability or useful benchmarks while there is a NIO component in a piece of code. the oracle engineers this year are certainly not on the same page as the jdk8 team, or the jdk6 team. Unsafe/NIO usecases number about 2: if you're utilizing mmap files to minimize page faults, go there. if you're talking to crossplatform structs and mailboxes, you have no choice. if you're squirreling away heap objects using something greater than the -Xmx setting, you should probably engineer it through mmap file access instead of using native handles directly, this is extremely unstable in my experience. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904834#comment-16904834 ] Ji Liu commented on ARROW-6206: --- Fine, no problem. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904832#comment-16904832 ] Micah Kornfield commented on ARROW-6206: I would say we don't need to include "stricter" things in the README, since they aren't really exceptions. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904831#comment-16904831 ] Ji Liu commented on ARROW-6206: --- Seems you are right, the reference was in test execution. If you would like to provide a PR for this, I think ''Java Code Style Guide' could also be updated (unused imports, redundant modifier) or I can take this issue:) (If so please let me know if there's other info should be updated besides the above ones). > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904828#comment-16904828 ] Micah Kornfield commented on ARROW-6206: It could probably be set, for all versions, but I think it is only required past JVM 8 (could be mistaken though). "true is already in pom.xml, it dosen't work?" The only reference I found was for test execution in POM.xml. I think Consumers of the library have to set this property themselves when running the JVM. But my maven knowledge is weak, so I might be misunderstanding something. > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904819#comment-16904819 ] Ji Liu commented on ARROW-6206: --- Yes, I think Wes has created a issue for restructured text https://issues.apache.org/jira/browse/ARROW-5542. One more question, why should set jvm param for JVM>=9? Not quite familiar, seems true is already in pom.xml, it dosen't work? > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904816#comment-16904816 ] Micah Kornfield commented on ARROW-6206: arrow/java/README.md is what I was thinking. At some point we might want to create more formal docs using restructured text at: [https://github.com/apache/arrow/tree/master/docs/source] > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6206) [Java][Docs] Document environment variables/java properties
[ https://issues.apache.org/jira/browse/ARROW-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904808#comment-16904808 ] Ji Liu commented on ARROW-6206: --- Curious where to update docs? is /arrow/java/README.md? I just noticed that something in this file should also be updated like 'Java Code Style Guide' > [Java][Docs] Document environment variables/java properties > --- > > Key: ARROW-6206 > URL: https://issues.apache.org/jira/browse/ARROW-6206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, Java >Reporter: Micah Kornfield >Priority: Major > > Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and > BoundsChecking/NullChecking for get. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)