date:20180720

Re: Plasma Java API to read RecordBatch from Python process

2018-07-20 Thread Philipp Moritz

Hey Jieun,

Currently we can only transfer RecordBatch objects using Plasma between C++
and Python unfortunately. I just opened a JIRA for doing it with Java too
in https://issues.apache.org/jira/browse/ARROW-2892.

The necessary pieces are there (in particular there is a low level API to
access Plasma from Java here: https://github.com/apache/arrow/pull/2065),
but it still requires some work, especially if we want to do it in a
zero-copy way.

Best,
Philipp.

On Fri, Jul 20, 2018 at 10:34 PM, Ji eun Jang 
wrote:

> Dear Arrow developers,
>
> I am a new Arrow user and need some help on Plasma Java API. (I am using
> compiled library from github repo, not the release 0.9.0).
>
> I am trying to move data between two processes (one in Java and the other
> in Python). I was able to do it through sockets using Java
> ArrowStreamReader/Writer and Python RecordBatchStreamReader/Writer.
>
> Now I want to explore the Plasma option. I was able to use Plasma storage
> using PlasmaBuffer within Python, as shown in
> https://arrow.apache.org/docs/python/plasma.html.
>
> Do we have  an equivalent API in Java? Could you please point me to an
> example, if exists, showing how to read/write record batches from/to Plasma
> storage in Java?
>
> Thank you,
>
> Jieun
>
>

[jira] [Created] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma

2018-07-20 Thread Philipp Moritz (JIRA)

Philipp Moritz created ARROW-2892:
-

 Summary: [Plasma] Implement interface to get Java arrow objects 
from Plasma
 Key: ARROW-2892
 URL: https://issues.apache.org/jira/browse/ARROW-2892
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


Currently we have a low level interface to access bytes stored in plasma from 
Java, using the JNI: [https://github.com/apache/arrow/pull/2065/]

 

As a followup, we should implement reading (and writing) Java arrow objects 
from plasma, if possible using zero-copy.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2891) Preserve schema in write_to_dataset

2018-07-20 Thread Jonathan Kulzick (JIRA)

Jonathan Kulzick created ARROW-2891:
---

 Summary: Preserve schema in write_to_dataset
 Key: ARROW-2891
 URL: https://issues.apache.org/jira/browse/ARROW-2891
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Jonathan Kulzick


When using `pyarrow.parquet.write_to_dataset` with `partition_cols` set, the 
schema of the `table` passed into the function is not enforced when iterating 
over the `subgroup` to create the `subtable`. See 
[here](https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L1146).

Since pandas is used to generate the subtables, there is a risk that some 
specificity is lost from the original `table.schema` due to the data types 
supported by pandas and some of the internal type conversions pandas performs. 
It would be ideal if a `subschema` was generated from `table.schema` and passed 
to `Table` when instantiating the `subtable` to allow the user to enforce the 
original schema.

Here is a simple example of where we are running into issues while trying to 
preserve a valid schema. This use case is more likely to occur when working 
with sparse data sets.

```
>>> from io import StringIO
>>> import pandas as pd
>>> import numpy as np
>>> import pyarrow as pa
>>> import parquet as pq
>>> import pyarrow.parquet as pq

# in csv col2 has no NaNs and in csv_nan col2 only has NaNs
>>> csv = StringIO('"1","10","100"')
>>> csv_nan = StringIO('"2","","200"')

# read in col2 as a float since pandas does not support NaNs in ints
>>> pd_dtype = \{'col1': np.int32, 'col2': np.float32, 'col3': np.int32}
>>> df = pd.read_csv(csv, header=None, names=['col1', 'col2', 'col3'], 
>>> dtype=pd_dtype)
>>> df_nan = pd.read_csv(csv_nan, header=None, names=['col1', 'col2', 'col3'], 
>>> dtype=pd_dtype)

# verify both dfs and their dtypes
>>> df
 col1 col2 col3
0 1 10.0 100

>>> df.dtypes
col1 int32
col2 float32
col3 int32
dtype: object

>>> df_nan
 col1 col2 col3
0 2 NaN 200

>>> df_nan.dtypes
col1 int32
col2 float32
col3 int32
dtype: object

# define col2 as an int32 since pyarrow does support NaNs in ints
# we want to preserve the original schema we started with and not
# upcast just because we're using pandas to go from csv to pyarrow
>>> schema = pa.schema([pa.field('col1', type=pa.int32()),
 pa.field('col2', type=pa.int32()),
 pa.field('col3', type=pa.int32())])

# verify schema
>>> schema
col1: int32
col2: int32
col3: int32

# create tables
>>> table = pa.Table.from_pandas(df, schema=schema, preserve_index=False)
>>> table_nan = pa.Table.from_pandas(df_nan, schema=schema, 
>>> preserve_index=False)

# verify table schemas and metadata
# col2 has pandas_type int32 and numpy_type float32 in both tables
>>> table
pyarrow.Table
col1: int32
col2: int32
col3: int32
metadata

{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
 b' "col1", "field_name": "col1", "pandas_type": "int32", "numpy_ty'
 b'pe": "int32", "metadata": null}, {"name": "col2", "field_name": '
 b'"col2", "pandas_type": "int32", "numpy_type": "float32", "metada'
 b'ta": null}, {"name": "col3", "field_name": "col3", "pandas_type"'
 b': "int32", "numpy_type": "int32", "metadata": null}], "pandas_ve'
 b'rsion": "0.22.0"}'}

>>> table_nan
pyarrow.Table
col1: int32
col2: int32
col3: int32
metadata

{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
 b' "col1", "field_name": "col1", "pandas_type": "int32", "numpy_ty'
 b'pe": "int32", "metadata": null}, {"name": "col2", "field_name": '
 b'"col2", "pandas_type": "int32", "numpy_type": "float32", "metada'
 b'ta": null}, {"name": "col3", "field_name": "col3", "pandas_type"'
 b': "int32", "numpy_type": "int32", "metadata": null}], "pandas_ve'
 b'rsion": "0.22.0"}'}

# write both tables to local filesystem
>>> pq.write_to_dataset(table, '/Users/jkulzick/pyarrow_example',
 partition_cols=['col1'],
 preserve_index=False)
>>> pq.write_to_dataset(table_nan, '/Users/jkulzick/pyarrow_example',
 partition_cols=['col1'],
 preserve_index=False)

# read parquet files into a ParquetDataset to validate the schemas
# the metadata and schemas for both files is different from their original 
tables
# table now has pandas_type int32 and numpy_type int32 (was float32) for col2
# table_nan now has pandas_type float64 (was int32) and numpy_type int64 (was 
float32) for col2
>>> ds = pq.ParquetDataset('/Users/jkulzick/pyarrow_example')
Traceback (most recent call last):
 File "", line 1, in 
 File 
"/Users/jkulzick/miniconda3/envs/bowerbird/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 745, in __init__
 self.validate_schemas()
 File 
"/Users/jkulzick/miniconda3/envs/bowerbird/lib/python3.6/site-packages/pyarrow/parquet.py",
 line 775, in validate_schemas
 dataset_schema))
ValueError: Schema in partition[col1=1]

[jira] [Created] (ARROW-2890) [Plasma] Make Python PlasmaClient.release private

2018-07-20 Thread Philipp Moritz (JIRA)

Philipp Moritz created ARROW-2890:
-

 Summary: [Plasma] Make Python PlasmaClient.release private
 Key: ARROW-2890
 URL: https://issues.apache.org/jira/browse/ARROW-2890
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Philipp Moritz


It should normally not be called by the user, since it is automatically called 
upon buffer destruction, see also 
https://github.com/apache/arrow/blob/7d2fbeba31763c978d260a9771184a13a63aaaf7/python/pyarrow/_plasma.pyx#L222.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Pyarrow Plasma client.release() fault

2018-07-20 Thread Philipp Moritz

Also you should avoid calling release directly, because it will also be
called automatically here:

https://github.com/apache/arrow/blob/master/python/pyarrow/_plasma.pyx#L222

Instead, you should call "del buffer" on the PlasmaBuffer. I'll submit a PR
to make the release method private.

The only real way to see what is going on is to have code to reproduce your
workload (maybe by shrinking the datasize we can make it run on a smaller
machine). Can you reproduce the problem with synthetic data?

-- Philipp.

On Fri, Jul 20, 2018 at 1:48 PM, Robert Nishihara  wrote:

> Hi Corey,
>
> It is possible that the current eviction policy will evict a ton of objects
> at once. Since the plasma store is single threaded, this could cause the
> plasma store to be unresponsive while the eviction is happening (though it
> should not hang permanently, just temporarily).
>
> You could always try starting the plasma store with a smaller amount of
> memory (using the "-m" flag) and see if that changes things.
>
> Glad to hear that ray is simplifying things.
>
> -Robert
>
> On Fri, Jul 20, 2018 at 1:30 PM Corey Nolet  wrote:
>
> > Robert,
> >
> > Yes I am using separate Plasma clients in each different thread. I also
> > verified that I am not using up all the file descriptors or reaching the
> > overcommit limit.
> >
> > I do see that the Plasma server is evicting objects every so often. I'm
> > assuming this eviction may be going on in the background? Is it possible
> > that the locking up may be the result of a massive eviction? I am
> > allocating over 8TB for the Plasma server.
> >
> > Wes,
> >
> > Best practices would be great. I did find that the @ray.remote scheduler
> > from the Ray project has drastically simplified my code.
> >
> > I also attempted using single-node PySpark but the type conversion I need
> > for going from CSV->Dataframes was orders of magnitude slower than Pandas
> > and Python.
> >
> >
> >
> > On Mon, Jul 16, 2018 at 8:17 PM Wes McKinney 
> wrote:
> >
> > > Seems like we might want to write down some best practices for this
> > > level of large scale usage, essentially a supercomputer-like rig. I
> > > wouldn't even know where to come by a machine with a machine with >
> > > 2TB memory for scalability / concurrency load testing
> > >
> > > On Mon, Jul 16, 2018 at 2:59 PM, Robert Nishihara
> > >  wrote:
> > > > Are you using the same plasma client from all of the different
> threads?
> > > If
> > > > so, that could cause race conditions as the client is not thread
> safe.
> > > >
> > > > Alternatively, if you have a separate plasma client for each thread,
> > then
> > > > you may be running out of file descriptors somewhere (either the
> client
> > > > process or the store).
> > > >
> > > > Can you check if the object store evicting objects (it prints
> something
> > > to
> > > > stdout/stderr when this happens)? Could you be running out of memory
> > but
> > > > failing to release the objects?
> > > >
> > > > On Tue, Jul 10, 2018 at 9:48 AM Corey Nolet 
> wrote:
> > > >
> > > >> Update:
> > > >>
> > > >> I'm investigating the possibility that I've reached the overcommit
> > > limit in
> > > >> the kernel as a result of all the parallel processes.
> > > >>
> > > >> This still doesn't fix the client.release() problem but it might
> > explain
> > > >> why the processing appears to halt, after some time, until I restart
> > the
> > > >> Jupyter kernel.
> > > >>
> > > >> On Tue, Jul 10, 2018 at 12:27 PM Corey Nolet 
> > wrote:
> > > >>
> > > >> > Wes,
> > > >> >
> > > >> > Unfortunately, my code is on a separate network. I'll try to
> explain
> > > what
> > > >> > I'm doing and if you need further detail, I can certainly
> pseudocode
> > > >> > specifics.
> > > >> >
> > > >> > I am using multiprocessing.Pool() to fire up a bunch of threads
> for
> > > >> > different filenames. In each thread, I'm performing a
> pd.read_csv(),
> > > >> > sorting by the timestamp field (rounded to the day) and chunking
> the
> > > >> > Dataframe into separate Dataframes. I create a new Plasma ObjectID
> > for
> > > >> each
> > > >> > of the chunked Dataframes, convert them to RecordBuffer objects,
> > > stream
> > > >> the
> > > >> > bytes to Plasma and seal the objects. Only the objectIDs are
> > returned
> > > to
> > > >> > the orchestration thread.
> > > >> >
> > > >> > In follow-on processing, I'm combining the ObjectIDs for each of
> the
> > > >> > unique day timestamps into lists and I'm passing those into a
> > > function in
> > > >> > parallel using multiprocessing.Pool(). In this function, I'm
> > iterating
> > > >> > through the lists of objectIds, loading them back into Dataframes,
> > > >> > appending them together until their size
> > > >> > is > some predefined threshold, and performing a df.to_parquet().
> > > >> >
> > > >> > The steps in the 2 paragraphs above are performing in a loop,
> > > batching up
> > > >> > 500-1k files at a time for each iteration.
> > > >> >
> > > >> > When I run this

Re: Pyarrow Plasma client.release() fault

2018-07-20 Thread Robert Nishihara

Hi Corey,

It is possible that the current eviction policy will evict a ton of objects
at once. Since the plasma store is single threaded, this could cause the
plasma store to be unresponsive while the eviction is happening (though it
should not hang permanently, just temporarily).

You could always try starting the plasma store with a smaller amount of
memory (using the "-m" flag) and see if that changes things.

Glad to hear that ray is simplifying things.

-Robert

On Fri, Jul 20, 2018 at 1:30 PM Corey Nolet  wrote:

> Robert,
>
> Yes I am using separate Plasma clients in each different thread. I also
> verified that I am not using up all the file descriptors or reaching the
> overcommit limit.
>
> I do see that the Plasma server is evicting objects every so often. I'm
> assuming this eviction may be going on in the background? Is it possible
> that the locking up may be the result of a massive eviction? I am
> allocating over 8TB for the Plasma server.
>
> Wes,
>
> Best practices would be great. I did find that the @ray.remote scheduler
> from the Ray project has drastically simplified my code.
>
> I also attempted using single-node PySpark but the type conversion I need
> for going from CSV->Dataframes was orders of magnitude slower than Pandas
> and Python.
>
>
>
> On Mon, Jul 16, 2018 at 8:17 PM Wes McKinney  wrote:
>
> > Seems like we might want to write down some best practices for this
> > level of large scale usage, essentially a supercomputer-like rig. I
> > wouldn't even know where to come by a machine with a machine with >
> > 2TB memory for scalability / concurrency load testing
> >
> > On Mon, Jul 16, 2018 at 2:59 PM, Robert Nishihara
> >  wrote:
> > > Are you using the same plasma client from all of the different threads?
> > If
> > > so, that could cause race conditions as the client is not thread safe.
> > >
> > > Alternatively, if you have a separate plasma client for each thread,
> then
> > > you may be running out of file descriptors somewhere (either the client
> > > process or the store).
> > >
> > > Can you check if the object store evicting objects (it prints something
> > to
> > > stdout/stderr when this happens)? Could you be running out of memory
> but
> > > failing to release the objects?
> > >
> > > On Tue, Jul 10, 2018 at 9:48 AM Corey Nolet  wrote:
> > >
> > >> Update:
> > >>
> > >> I'm investigating the possibility that I've reached the overcommit
> > limit in
> > >> the kernel as a result of all the parallel processes.
> > >>
> > >> This still doesn't fix the client.release() problem but it might
> explain
> > >> why the processing appears to halt, after some time, until I restart
> the
> > >> Jupyter kernel.
> > >>
> > >> On Tue, Jul 10, 2018 at 12:27 PM Corey Nolet 
> wrote:
> > >>
> > >> > Wes,
> > >> >
> > >> > Unfortunately, my code is on a separate network. I'll try to explain
> > what
> > >> > I'm doing and if you need further detail, I can certainly pseudocode
> > >> > specifics.
> > >> >
> > >> > I am using multiprocessing.Pool() to fire up a bunch of threads for
> > >> > different filenames. In each thread, I'm performing a pd.read_csv(),
> > >> > sorting by the timestamp field (rounded to the day) and chunking the
> > >> > Dataframe into separate Dataframes. I create a new Plasma ObjectID
> for
> > >> each
> > >> > of the chunked Dataframes, convert them to RecordBuffer objects,
> > stream
> > >> the
> > >> > bytes to Plasma and seal the objects. Only the objectIDs are
> returned
> > to
> > >> > the orchestration thread.
> > >> >
> > >> > In follow-on processing, I'm combining the ObjectIDs for each of the
> > >> > unique day timestamps into lists and I'm passing those into a
> > function in
> > >> > parallel using multiprocessing.Pool(). In this function, I'm
> iterating
> > >> > through the lists of objectIds, loading them back into Dataframes,
> > >> > appending them together until their size
> > >> > is > some predefined threshold, and performing a df.to_parquet().
> > >> >
> > >> > The steps in the 2 paragraphs above are performing in a loop,
> > batching up
> > >> > 500-1k files at a time for each iteration.
> > >> >
> > >> > When I run this iteration a few times, it eventually locks up the
> > Plasma
> > >> > client. With regards to the release() fault, it doesn't seem to
> matter
> > >> when
> > >> > or where I run it (in the orchestration thread or in other threads),
> > it
> > >> > always seems to crash the Jupyter kernel. I'm thinking I might be
> > using
> > >> it
> > >> > wrong, I'm just trying to figure out where and what I'm doing.
> > >> >
> > >> > Thanks again!
> > >> >
> > >> > On Tue, Jul 10, 2018 at 12:05 PM Wes McKinney 
> > >> wrote:
> > >> >
> > >> >> hi Corey,
> > >> >>
> > >> >> Can you provide the code (or a simplified version thereof) that
> shows
> > >> >> how you're using Plasma?
> > >> >>
> > >> >> - Wes
> > >> >>
> > >> >> On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet 
> > >> wrote:
> > >> >> > I'm on a system with 12TB of memory

Re: Pyarrow Plasma client.release() fault

2018-07-20 Thread Corey Nolet

Robert,

Yes I am using separate Plasma clients in each different thread. I also
verified that I am not using up all the file descriptors or reaching the
overcommit limit.

I do see that the Plasma server is evicting objects every so often. I'm
assuming this eviction may be going on in the background? Is it possible
that the locking up may be the result of a massive eviction? I am
allocating over 8TB for the Plasma server.

Wes,

Best practices would be great. I did find that the @ray.remote scheduler
from the Ray project has drastically simplified my code.

I also attempted using single-node PySpark but the type conversion I need
for going from CSV->Dataframes was orders of magnitude slower than Pandas
and Python.



On Mon, Jul 16, 2018 at 8:17 PM Wes McKinney  wrote:

> Seems like we might want to write down some best practices for this
> level of large scale usage, essentially a supercomputer-like rig. I
> wouldn't even know where to come by a machine with a machine with >
> 2TB memory for scalability / concurrency load testing
>
> On Mon, Jul 16, 2018 at 2:59 PM, Robert Nishihara
>  wrote:
> > Are you using the same plasma client from all of the different threads?
> If
> > so, that could cause race conditions as the client is not thread safe.
> >
> > Alternatively, if you have a separate plasma client for each thread, then
> > you may be running out of file descriptors somewhere (either the client
> > process or the store).
> >
> > Can you check if the object store evicting objects (it prints something
> to
> > stdout/stderr when this happens)? Could you be running out of memory but
> > failing to release the objects?
> >
> > On Tue, Jul 10, 2018 at 9:48 AM Corey Nolet  wrote:
> >
> >> Update:
> >>
> >> I'm investigating the possibility that I've reached the overcommit
> limit in
> >> the kernel as a result of all the parallel processes.
> >>
> >> This still doesn't fix the client.release() problem but it might explain
> >> why the processing appears to halt, after some time, until I restart the
> >> Jupyter kernel.
> >>
> >> On Tue, Jul 10, 2018 at 12:27 PM Corey Nolet  wrote:
> >>
> >> > Wes,
> >> >
> >> > Unfortunately, my code is on a separate network. I'll try to explain
> what
> >> > I'm doing and if you need further detail, I can certainly pseudocode
> >> > specifics.
> >> >
> >> > I am using multiprocessing.Pool() to fire up a bunch of threads for
> >> > different filenames. In each thread, I'm performing a pd.read_csv(),
> >> > sorting by the timestamp field (rounded to the day) and chunking the
> >> > Dataframe into separate Dataframes. I create a new Plasma ObjectID for
> >> each
> >> > of the chunked Dataframes, convert them to RecordBuffer objects,
> stream
> >> the
> >> > bytes to Plasma and seal the objects. Only the objectIDs are returned
> to
> >> > the orchestration thread.
> >> >
> >> > In follow-on processing, I'm combining the ObjectIDs for each of the
> >> > unique day timestamps into lists and I'm passing those into a
> function in
> >> > parallel using multiprocessing.Pool(). In this function, I'm iterating
> >> > through the lists of objectIds, loading them back into Dataframes,
> >> > appending them together until their size
> >> > is > some predefined threshold, and performing a df.to_parquet().
> >> >
> >> > The steps in the 2 paragraphs above are performing in a loop,
> batching up
> >> > 500-1k files at a time for each iteration.
> >> >
> >> > When I run this iteration a few times, it eventually locks up the
> Plasma
> >> > client. With regards to the release() fault, it doesn't seem to matter
> >> when
> >> > or where I run it (in the orchestration thread or in other threads),
> it
> >> > always seems to crash the Jupyter kernel. I'm thinking I might be
> using
> >> it
> >> > wrong, I'm just trying to figure out where and what I'm doing.
> >> >
> >> > Thanks again!
> >> >
> >> > On Tue, Jul 10, 2018 at 12:05 PM Wes McKinney 
> >> wrote:
> >> >
> >> >> hi Corey,
> >> >>
> >> >> Can you provide the code (or a simplified version thereof) that shows
> >> >> how you're using Plasma?
> >> >>
> >> >> - Wes
> >> >>
> >> >> On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet 
> >> wrote:
> >> >> > I'm on a system with 12TB of memory and attempting to use Pyarrow's
> >> >> Plasma
> >> >> > client to convert a series of CSV files (via Pandas) into a Parquet
> >> >> store.
> >> >> >
> >> >> > I've got a little over 20k CSV files to process which are about
> 1-2gb
> >> >> each.
> >> >> > I'm loading 500 to 1000 files at a time.
> >> >> >
> >> >> > In each iteration, I'm loading a series of files, partitioning them
> >> by a
> >> >> > time field into separate dataframes, then writing parquet files in
> >> >> > directories for each day.
> >> >> >
> >> >> > The problem I'm having is that the Plasma client & server appear to
> >> >> lock up
> >> >> > after about 2-3 iterations. It locks up to the point where I can't
> >> even
> >> >> > CTRL+C the server. I am able to stop the notebook and

[jira] [Created] (ARROW-2889) [C++] Add optional argument to ADD_ARROW_TEST CMake function to add unit test prefix

2018-07-20 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-2889:
---

 Summary: [C++] Add optional argument to ADD_ARROW_TEST CMake 
function to add unit test prefix
 Key: ARROW-2889
 URL: https://issues.apache.org/jira/browse/ARROW-2889
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


We are already adding prefixes manually in the filenames (example: 
https://github.com/apache/arrow/tree/master/cpp/src/arrow/ipc). This can be a 
bit awkward at times. It may be useful to have an option like {{PREFIX "ipc-"}} 
or {{PREFIX "plasma-"}} to make the unit tests related to a project component 
easier to identify



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2888) [Plasma] Several GPU-related APIs are used in places where errors cannot be appropriately handled

2018-07-20 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-2888:
---

 Summary: [Plasma] Several GPU-related APIs are used in places 
where errors cannot be appropriately handled
 Key: ARROW-2888
 URL: https://issues.apache.org/jira/browse/ARROW-2888
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Reporter: Wes McKinney


I'm adding {{DCHECK_OK}} statements for ARROW-2883 to fix the unchecked Status 
warnings, but this code should be refactored so that these errors can bubble up 
properly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2887) [Plasma] Methods in plasma/store.h returning PlasmaError should return Status instead

2018-07-20 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-2887:
---

 Summary: [Plasma] Methods in plasma/store.h returning PlasmaError 
should return Status instead
 Key: ARROW-2887
 URL: https://issues.apache.org/jira/browse/ARROW-2887
 Project: Apache Arrow
  Issue Type: Bug
  Components: Plasma (C++)
Reporter: Wes McKinney


These functions are not able to return other kinds of errors (e.g. CUDA-related 
errors) as a result of this. I encountered this while working on ARROW-2883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Arrow stickers

2018-07-20 Thread Jacques Nadeau

I think options 1-3 should all be okay. Do we need to run a formal vote to
approve? It seems like they are just the logo and the name so I'm not sure
anything further is needed. Items 4-6 introduce colors and I think we
should probably do further exploration before adding more colors to the
"Arrow palette" (if you will).

On Thu, Jul 19, 2018 at 5:36 PM, Wes McKinney  wrote:

> hi Kelly,
>
> I'm personally fine with options 1 through 3. It would be great if we
> could fit option 1 on a hex sticker
>
> Anyone else have comments?
> https://docs.google.com/document/d/1DT1Zexs56Et3GZGpieAUXeaot9U69
> _xZAX4_odGhfSs/edit
>
> - Wes
>
> On Tue, Jul 10, 2018 at 11:41 AM, Kelly Stirman  wrote:
> > I updated the images in the doc to include Apache. Have a look.
> >
> > On Tue, Jul 10, 2018 at 7:59 AM, Julian Hyde 
> wrote:
> >
> >> Thanks for driving this.
> >>
> >> Can you put the word “apache” in there (in smaller font if you like).
> That
> >> way, if you have the logo on slide 1 of your presentation, you’ve
> already
> >> done your duty to mention the Apache brand.
> >>
> >> Julian
> >>
> >> > On Jul 9, 2018, at 19:07, Kelly Stirman  wrote:
> >> >
> >> > Hi everyone!
> >> >
> >> > I had our designer put together some logo options. I put them together
> >> in this
> >> > doc
> >> >  >> _xZAX4_odGhfSs/edit>
> >> > where
> >> > you can leave feedback, or in this thread. I did some quick outlines
> to
> >> > make it so you can imagine what they would look like as cut out
> stickers.
> >> >
> >> > I think they would all work as hexagons too, and both options are
> >> probably
> >> > good to have.
> >> >
> >> > Hope this helps.
> >> >
> >> > Kelly
> >>
> >
> >
> >
> > --
> > Kelly Stirman
> > CMO, Dremio
> > +1.267.496.2759
> > @kstirman
>

Re: Plasma Java API to read RecordBatch from Python process

[jira] [Created] (ARROW-2892) [Plasma] Implement interface to get Java arrow objects from Plasma

[jira] [Created] (ARROW-2891) Preserve schema in write_to_dataset

[jira] [Created] (ARROW-2890) [Plasma] Make Python PlasmaClient.release private

Re: Pyarrow Plasma client.release() fault

Re: Pyarrow Plasma client.release() fault

Re: Pyarrow Plasma client.release() fault

[jira] [Created] (ARROW-2889) [C++] Add optional argument to ADD_ARROW_TEST CMake function to add unit test prefix

[jira] [Created] (ARROW-2888) [Plasma] Several GPU-related APIs are used in places where errors cannot be appropriately handled

[jira] [Created] (ARROW-2887) [Plasma] Methods in plasma/store.h returning PlasmaError should return Status instead

Re: Arrow stickers

11 matches

Site Navigation

Mail list logo

Footer information