[jira] [Commented] (ARROW-1773) [C++] Add casts from date/time types to compatible signed integers

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257947#comment-16257947
 ] 

ASF GitHub Bot commented on ARROW-1773:
---

Licht-T commented on issue #1310: ARROW-1773: [C++] Add casts from date/time 
types to compatible signed integers
URL: https://github.com/apache/arrow/pull/1310#issuecomment-345421309
 
 
   Thanks @wesm!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casts from date/time types to compatible signed integers
> --
>
> Key: ARROW-1773
> URL: https://issues.apache.org/jira/browse/ARROW-1773
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> e.g.
> {code}
> In [3]: arr = pa.array([1,2,3], type='i4')
> In [4]: arr.cast('date32')
> Out[4]: 
> 
> [
>   datetime.date(1970, 1, 2),
>   datetime.date(1970, 1, 3),
>   datetime.date(1970, 1, 4)
> ]
> In [5]: arr.cast('date32').cast('i4')
> ---
> ArrowNotImplementedError  Traceback (most recent call last)
>  in ()
> > 1 arr.cast('date32').cast('i4')
> /home/wesm/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.cast 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:28923)()
> 266 
> 267 with nogil:
> --> 268 check_status(Cast(_context(), self.ap[0], type.sp_type,
> 269   options, ))
> 270 
> /home/wesm/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:8306)()
>  83 raise ArrowKeyError(message)
>  84 elif status.IsNotImplemented():
> ---> 85 raise ArrowNotImplementedError(message)
>  86 elif status.IsTypeError():
>  87 raise ArrowTypeError(message)
> ArrowNotImplementedError: 
> /home/wesm/code/arrow/cpp/src/arrow/compute/cast.cc:920 code: 
> GetCastFunction(*array.type(), out_type, options, )
> No cast implemented from date32[day] to int32
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1773) [C++] Add casts from date/time types to compatible signed integers

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257937#comment-16257937
 ] 

ASF GitHub Bot commented on ARROW-1773:
---

wesm closed pull request #1310: ARROW-1773: [C++] Add casts from date/time 
types to compatible signed integers
URL: https://github.com/apache/arrow/pull/1310
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/compute/compute-test.cc 
b/cpp/src/arrow/compute/compute-test.cc
index 5eada398d..58a991c60 100644
--- a/cpp/src/arrow/compute/compute-test.cc
+++ b/cpp/src/arrow/compute/compute-test.cc
@@ -334,6 +334,9 @@ TEST_F(TestCast, TimestampToTimestamp) {
   );
   CheckZeroCopy(*arr, timestamp(TimeUnit::SECOND));
 
+  // ARROW-1773, cast to integer
+  CheckZeroCopy(*arr, int64());
+
   // Divide, truncate
   vector v8 = {0, 100123, 200456, 1123, 2456};
   vector e8 = {0, 100, 200, 1, 2};
@@ -432,7 +435,7 @@ TEST_F(TestCast, TimestampToDate32_Date64) {
   timestamp(TimeUnit::SECOND), v_second_nofail, is_valid, date32(), v_day, 
options);
 }
 
-TEST_F(TestCast, TimeToTime) {
+TEST_F(TestCast, TimeToCompatible) {
   CastOptions options;
 
   vector is_valid = {true, false, true, true, true};
@@ -474,6 +477,16 @@ TEST_F(TestCast, TimeToTime) {
   ArrayFromVector(time64(TimeUnit::MICRO), is_valid, v7, 
);
   CheckZeroCopy(*arr, time64(TimeUnit::MICRO));
 
+  // ARROW-1773: cast to int64
+  CheckZeroCopy(*arr, int64());
+
+  vector v7_2 = {0, 7, 2000, 1000, 0};
+  ArrayFromVector(time32(TimeUnit::SECOND), is_valid, 
v7_2, );
+  CheckZeroCopy(*arr, time32(TimeUnit::SECOND));
+
+  // ARROW-1773: cast to int64
+  CheckZeroCopy(*arr, int32());
+
   // Divide, truncate
   vector v8 = {0, 100123, 200456, 1123, 2456};
   vector e8 = {0, 100, 200, 1, 2};
@@ -515,7 +528,7 @@ TEST_F(TestCast, TimeToTime) {
  options);
 }
 
-TEST_F(TestCast, DateToDate) {
+TEST_F(TestCast, DateToCompatible) {
   CastOptions options;
 
   vector is_valid = {true, false, true, true, true};
@@ -535,9 +548,15 @@ TEST_F(TestCast, DateToDate) {
   ArrayFromVector(date32(), is_valid, v2, );
   CheckZeroCopy(*arr, date32());
 
+  // ARROW-1773: zero copy cast to integer
+  CheckZeroCopy(*arr, int32());
+
   ArrayFromVector(date64(), is_valid, v3, );
   CheckZeroCopy(*arr, date64());
 
+  // ARROW-1773: zero copy cast to integer
+  CheckZeroCopy(*arr, int64());
+
   // Divide, truncate
   vector v8 = {0, 100 * F + 123, 200 * F + 456, F + 123, 2 * F + 456};
   vector e8 = {0, 100, 200, 1, 2};
diff --git a/cpp/src/arrow/compute/kernels/cast.cc 
b/cpp/src/arrow/compute/kernels/cast.cc
index 6a42ec8b2..c866054ea 100644
--- a/cpp/src/arrow/compute/kernels/cast.cc
+++ b/cpp/src/arrow/compute/kernels/cast.cc
@@ -91,10 +91,14 @@ struct is_zero_copy_cast<
 // From integers to date/time types with zero copy
 template 
 struct is_zero_copy_cast<
-O, I, typename std::enable_if::value &&
-  (std::is_base_of::value ||
-   std::is_base_of::value ||
-   std::is_base_of::value)>::type> {
+O, I,
+typename std::enable_if<
+(std::is_base_of::value &&
+ (std::is_base_of::value || std::is_base_of::value ||
+  std::is_base_of::value)) ||
+(std::is_base_of::value &&
+ (std::is_base_of::value || std::is_base_of::value ||
+  std::is_base_of::value))>::type> {
   using O_T = typename O::c_type;
   using I_T = typename I::c_type;
 
@@ -809,24 +813,29 @@ class CastKernel : public UnaryKernel {
 
 #define DATE32_CASES(FN, IN_TYPE) \
   FN(Date32Type, Date32Type); \
-  FN(Date32Type, Date64Type);
+  FN(Date32Type, Date64Type); \
+  FN(Date32Type, Int32Type);
 
 #define DATE64_CASES(FN, IN_TYPE) \
   FN(Date64Type, Date64Type); \
-  FN(Date64Type, Date32Type);
+  FN(Date64Type, Date32Type); \
+  FN(Date64Type, Int64Type);
 
 #define TIME32_CASES(FN, IN_TYPE) \
   FN(Time32Type, Time32Type); \
-  FN(Time32Type, Time64Type);
+  FN(Time32Type, Time64Type); \
+  FN(Time32Type, Int32Type);
 
 #define TIME64_CASES(FN, IN_TYPE) \
   FN(Time64Type, Time32Type); \
-  FN(Time64Type, Time64Type);
+  FN(Time64Type, Time64Type); \
+  FN(Time64Type, Int64Type);
 
 #define TIMESTAMP_CASES(FN, IN_TYPE) \
   FN(TimestampType, TimestampType);  \
   FN(TimestampType, Date32Type); \
-  

[jira] [Resolved] (ARROW-1773) [C++] Add casts from date/time types to compatible signed integers

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1773.
-
   Resolution: Fixed
Fix Version/s: 0.8.0

Issue resolved by pull request 1310
[https://github.com/apache/arrow/pull/1310]

> [C++] Add casts from date/time types to compatible signed integers
> --
>
> Key: ARROW-1773
> URL: https://issues.apache.org/jira/browse/ARROW-1773
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> e.g.
> {code}
> In [3]: arr = pa.array([1,2,3], type='i4')
> In [4]: arr.cast('date32')
> Out[4]: 
> 
> [
>   datetime.date(1970, 1, 2),
>   datetime.date(1970, 1, 3),
>   datetime.date(1970, 1, 4)
> ]
> In [5]: arr.cast('date32').cast('i4')
> ---
> ArrowNotImplementedError  Traceback (most recent call last)
>  in ()
> > 1 arr.cast('date32').cast('i4')
> /home/wesm/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.cast 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:28923)()
> 266 
> 267 with nogil:
> --> 268 check_status(Cast(_context(), self.ap[0], type.sp_type,
> 269   options, ))
> 270 
> /home/wesm/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:8306)()
>  83 raise ArrowKeyError(message)
>  84 elif status.IsNotImplemented():
> ---> 85 raise ArrowNotImplementedError(message)
>  86 elif status.IsTypeError():
>  87 raise ArrowTypeError(message)
> ArrowNotImplementedError: 
> /home/wesm/code/arrow/cpp/src/arrow/compute/cast.cc:920 code: 
> GetCastFunction(*array.type(), out_type, options, )
> No cast implemented from date32[day] to int32
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257935#comment-16257935
 ] 

ASF GitHub Bot commented on ARROW-1575:
---

wesm commented on issue #1329: ARROW-1575: [Python] Add tests for 
pyarrow.column factory function
URL: https://github.com/apache/arrow/pull/1329#issuecomment-345420382
 
 
   Build only failing due to non-deterministic Plasma failure


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1791.
-
Resolution: Fixed

Resolved by PR 
https://github.com/apache/arrow/commit/202e6503cd2e941a2df2ccacedf09611d8ad2e0f

> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257931#comment-16257931
 ] 

ASF GitHub Bot commented on ARROW-1791:
---

wesm commented on issue #1328: ARROW-1791: Limit generated data range to 
physical limits for temporal types
URL: https://github.com/apache/arrow/pull/1328#issuecomment-345420329
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1831) [Python] Docker-based documentation build does not properly set LD_LIBRARY_PATH

2017-11-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1831:
---

 Summary: [Python] Docker-based documentation build does not 
properly set LD_LIBRARY_PATH
 Key: ARROW-1831
 URL: https://issues.apache.org/jira/browse/ARROW-1831
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.8.0


see https://github.com/apache/arrow/issues/1324



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1391) [Python] Benchmarks for python serialization

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1391:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [Python] Benchmarks for python serialization
> 
>
> Key: ARROW-1391
> URL: https://issues.apache.org/jira/browse/ARROW-1391
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Philipp Moritz
>Priority: Minor
> Fix For: 0.9.0
>
>
> It would be great to have a suite of relevant benchmarks for the Python 
> serialization code in ARROW-759. These could be used to guide profiling and 
> performance improvements.
> Relevant use cases include:
> - dictionaries of large numpy arrays that are used to represent weights of a 
> neural network
> - long lists of primitive types like ints, floats or strings
> - lists of user defined python objects



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257879#comment-16257879
 ] 

Wes McKinney commented on ARROW-1830:
-

Appears we should relax the constraint that Parquet files end in {{.parq}} or 
{{.parquet}}. What system wrote the Parquet files?

> [Python] Error when loading all the files in a dictionary
> -
>
> Key: ARROW-1830
> URL: https://issues.apache.org/jira/browse/ARROW-1830
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>Reporter: DB Tsai
> Fix For: 0.8.0
>
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
> self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1830:

Fix Version/s: 0.8.0

> [Python] Error when loading all the files in a dictionary
> -
>
> Key: ARROW-1830
> URL: https://issues.apache.org/jira/browse/ARROW-1830
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>Reporter: DB Tsai
> Fix For: 0.8.0
>
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
> self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1805) [Python] ignore non-parquet files when exploring dataset

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1805.
-
Resolution: Fixed

Issue resolved by pull request 1314
[https://github.com/apache/arrow/pull/1314]

> [Python] ignore non-parquet files when exploring dataset
> 
>
> Key: ARROW-1805
> URL: https://issues.apache.org/jira/browse/ARROW-1805
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Manuel Valdés
>Assignee: Manuel Valdés
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When exploring a ParquetDataset, some files 
> (_metadata,_common_metadata,_SUCCESS) should be ignored when determining if a 
> directory follows a valid structure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1805) [Python] ignore non-parquet files when exploring dataset

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257877#comment-16257877
 ] 

ASF GitHub Bot commented on ARROW-1805:
---

wesm closed pull request #1314: ARROW-1805: [Python] Ignore special private 
files when traversing ParquetDataset
URL: https://github.com/apache/arrow/pull/1314
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py
index 9e0749bb3..3023e1771 100644
--- a/python/pyarrow/parquet.py
+++ b/python/pyarrow/parquet.py
@@ -573,7 +573,7 @@ def _visit_level(self, level, base_path, part_keys):
 filtered_files.sort()
 filtered_directories.sort()
 
-if len(files) > 0 and len(filtered_directories) > 0:
+if len(filtered_files) > 0 and len(filtered_directories) > 0:
 raise ValueError('Found files in an intermediate '
  'directory: {0}'.format(base_path))
 elif len(filtered_directories) > 0:
diff --git a/python/pyarrow/tests/test_parquet.py 
b/python/pyarrow/tests/test_parquet.py
index 1df80acc0..522815fce 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -1027,8 +1027,11 @@ def _visit_level(base_dir, level, part_keys):
 with fs.open(file_path, 'wb') as f:
 _write_table(part_table, f)
 assert fs.exists(file_path)
+
+_touch(pjoin(level_dir, '_SUCCESS'))
 else:
 _visit_level(level_dir, level + 1, this_part_keys)
+_touch(pjoin(level_dir, '_SUCCESS'))
 
 _visit_level(base_dir, 0, [])
 
@@ -1101,6 +1104,11 @@ def _filter_partition(df, part_keys):
 return df[predicate].drop(to_drop, axis=1)
 
 
+def _touch(path):
+with open(path, 'wb'):
+pass
+
+
 @parquet
 def test_read_multiple_files(tmpdir):
 import pyarrow.parquet as pq
@@ -1128,8 +1136,7 @@ def test_read_multiple_files(tmpdir):
 paths.append(path)
 
 # Write a _SUCCESS.crc file
-with open(pjoin(dirpath, '_SUCCESS.crc'), 'wb') as f:
-f.write(b'0')
+_touch(pjoin(dirpath, '_SUCCESS.crc'))
 
 def read_multiple_files(paths, columns=None, nthreads=None, **kwargs):
 dataset = pq.ParquetDataset(paths, **kwargs)


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] ignore non-parquet files when exploring dataset
> 
>
> Key: ARROW-1805
> URL: https://issues.apache.org/jira/browse/ARROW-1805
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Manuel Valdés
>Assignee: Manuel Valdés
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When exploring a ParquetDataset, some files 
> (_metadata,_common_metadata,_SUCCESS) should be ignored when determining if a 
> directory follows a valid structure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1628) [Python] Incorrect serialization of numpy datetimes.

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257876#comment-16257876
 ] 

Wes McKinney commented on ARROW-1628:
-

Removing this from 0.8.0 unless it's urgent. There's some annoying things here, 
because the datetime64 objects can have different metadata:

{code}
In [4]: arr1 = np.array([datetime(2000, 1, 1)], dtype='datetime64')

In [5]: arr1
Out[5]: array(['2000-01-01'], dtype='datetime64[D]')

In [6]: arr1[0]
Out[6]: numpy.datetime64('2000-01-01')

In [7]: arr1[0].dtype
Out[7]: dtype(' [Python] Incorrect serialization of numpy datetimes.
> 
>
> Key: ARROW-1628
> URL: https://issues.apache.org/jira/browse/ARROW-1628
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Robert Nishihara
> Fix For: 0.9.0
>
>
> See https://github.com/ray-project/ray/issues/1041.
> The issue can be reproduced as follows.
> {code}
> import pyarrow as pa
> import numpy as np
> t = np.datetime64(datetime.datetime.now())
> print(type(t), t)  #  2017-09-30T09:50:46.089952
> t_new = pa.deserialize(pa.serialize(t).to_buffer())
> print(type(t_new), t_new)  #  0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1628) [Python] Incorrect serialization of numpy datetimes.

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1628:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [Python] Incorrect serialization of numpy datetimes.
> 
>
> Key: ARROW-1628
> URL: https://issues.apache.org/jira/browse/ARROW-1628
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Robert Nishihara
> Fix For: 0.9.0
>
>
> See https://github.com/ray-project/ray/issues/1041.
> The issue can be reproduced as follows.
> {code}
> import pyarrow as pa
> import numpy as np
> t = np.datetime64(datetime.datetime.now())
> print(type(t), t)  #  2017-09-30T09:50:46.089952
> t_new = pa.deserialize(pa.serialize(t).to_buffer())
> print(type(t_new), t_new)  #  0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257872#comment-16257872
 ] 

Wes McKinney commented on ARROW-1710:
-

I would also propose to remove the Nullable prefix and add "dirty" accessor 
methods for users who are working with data without nulls (or that can be used 
on the hot path when you see that the null count for a vector is 0)

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257871#comment-16257871
 ] 

ASF GitHub Bot commented on ARROW-1575:
---

wesm opened a new pull request #1329: ARROW-1575: [Python] Add tests for 
pyarrow.column factory function
URL: https://github.com/apache/arrow/pull/1329
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1575:
--
Labels: pull-request-available  (was: )

> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1805) [Python] ignore non-parquet files when exploring dataset

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257870#comment-16257870
 ] 

ASF GitHub Bot commented on ARROW-1805:
---

wesm commented on issue #1314: ARROW-1805: [Python] Ignore special private 
files when traversing ParquetDataset
URL: https://github.com/apache/arrow/pull/1314#issuecomment-345410541
 
 
   Plasma tests are failing here on macOS, @robertnishihara or @pcmoritz can 
you take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] ignore non-parquet files when exploring dataset
> 
>
> Key: ARROW-1805
> URL: https://issues.apache.org/jira/browse/ARROW-1805
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Manuel Valdés
>Assignee: Manuel Valdés
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When exploring a ParquetDataset, some files 
> (_metadata,_common_metadata,_SUCCESS) should be ignored when determining if a 
> directory follows a valid structure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread DB Tsai (JIRA)
DB Tsai created ARROW-1830:
--

 Summary: [Python] Error when loading all the files in a dictionary
 Key: ARROW-1830
 URL: https://issues.apache.org/jira/browse/ARROW-1830
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.1
 Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + pyarrow 
0.7.1
Reporter: DB Tsai


I can read one parquet file, but when I tried to read all the parquet files in 
a folder, I got an error.

{code:python}
>>> data = 
>>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
>>> data = pq.ParquetDataset('./aaa/')
Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
in __init__
self.validate_schemas()
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
in validate_schemas
self.schema = self.pieces[0].get_metadata(open_file).schema
IndexError: list index out of range
>>> 
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated ARROW-1830:
---
Description: 
I can read one parquet file, but when I tried to read all the parquet files in 
a folder, I got an error.

{code:java}
>>> data = 
>>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
>>> data = pq.ParquetDataset('./aaa/')
Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
in __init__
self.validate_schemas()
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
in validate_schemas
self.schema = self.pieces[0].get_metadata(open_file).schema
IndexError: list index out of range
>>> 
{code}


  was:
I can read one parquet file, but when I tried to read all the parquet files in 
a folder, I got an error.

{code:python}
>>> data = 
>>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
>>> data = pq.ParquetDataset('./aaa/')
Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
in __init__
self.validate_schemas()
  File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
in validate_schemas
self.schema = self.pieces[0].get_metadata(open_file).schema
IndexError: list index out of range
>>> 
{code}



> [Python] Error when loading all the files in a dictionary
> -
>
> Key: ARROW-1830
> URL: https://issues.apache.org/jira/browse/ARROW-1830
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: Python 2.7.11 (default, Jan 22 2016, 08:29:18)  + 
> pyarrow 0.7.1
>Reporter: DB Tsai
>
> I can read one parquet file, but when I tried to read all the parquet files 
> in a folder, I got an error.
> {code:java}
> >>> data = 
> >>> pq.ParquetDataset('./aaa/part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86')
> >>> data = pq.ParquetDataset('./aaa/')
> Ignoring path: ./aaa//part-0-d8268e3a-4e65-41a3-a43e-01e0bf68ee86
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 638, 
> in __init__
> self.validate_schemas()
>   File "/usr/local/lib/python2.7/site-packages/pyarrow/parquet.py", line 647, 
> in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> >>> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257857#comment-16257857
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

wesm commented on issue #1327: ARROW-1795: [Plasma] Create flag to make Plasma 
store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345408634
 
 
   Most recent build stalled in macOS, we should keep an eye on if it becomes a 
recurring thing
   
   ```
   2: [ RUN  ] TestPlasmaStore.MultipleClientTest
   No output has been received in the last 10m0s, this potentially indicates a 
stalled build or something wrong with the build itself.
   Check the details on how to adjust your build configuration on: 
https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
   The build has been terminated
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1575:
---

Assignee: Wes McKinney

> [Python] Add pyarrow.column factory function
> 
>
> Key: ARROW-1575
> URL: https://issues.apache.org/jira/browse/ARROW-1575
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257854#comment-16257854
 ] 

ASF GitHub Bot commented on ARROW-1791:
---

trxcllnt commented on issue #1328: ARROW-1791: Limit generated data range to 
physical limits for temporal types
URL: https://github.com/apache/arrow/pull/1328#issuecomment-345408100
 
 
   Yep, the Date ranges work great for JS. For timestamps, we just return the 
literal values w/o casting them as Dates at the moment, so they don't need to 
be clamped to 53 bits.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257849#comment-16257849
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-345407725
 
 
   @siddharthteotia is this something you would like to run with the Dremio 
suite of tests before merging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-11-17 Thread Bryan Cutler (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned ARROW-1710:
---

Assignee: Bryan Cutler

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
>Assignee: Bryan Cutler
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257848#comment-16257848
 ] 

ASF GitHub Bot commented on ARROW-1791:
---

wesm opened a new pull request #1328: ARROW-1791: Limit generated data range to 
physical limits for temporal types
URL: https://github.com/apache/arrow/pull/1328
 
 
   cc @trxcllnt, do these time ranges seem reasonable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1791:
--
Labels: pull-request-available  (was: )

> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1773) [C++] Add casts from date/time types to compatible signed integers

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257838#comment-16257838
 ] 

ASF GitHub Bot commented on ARROW-1773:
---

wesm commented on issue #1310: ARROW-1773: [C++] Add casts from date/time types 
to compatible signed integers
URL: https://github.com/apache/arrow/pull/1310#issuecomment-345406562
 
 
   OK, this can be merged now on green build


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casts from date/time types to compatible signed integers
> --
>
> Key: ARROW-1773
> URL: https://issues.apache.org/jira/browse/ARROW-1773
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
>
> e.g.
> {code}
> In [3]: arr = pa.array([1,2,3], type='i4')
> In [4]: arr.cast('date32')
> Out[4]: 
> 
> [
>   datetime.date(1970, 1, 2),
>   datetime.date(1970, 1, 3),
>   datetime.date(1970, 1, 4)
> ]
> In [5]: arr.cast('date32').cast('i4')
> ---
> ArrowNotImplementedError  Traceback (most recent call last)
>  in ()
> > 1 arr.cast('date32').cast('i4')
> /home/wesm/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.cast 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:28923)()
> 266 
> 267 with nogil:
> --> 268 check_status(Cast(_context(), self.ap[0], type.sp_type,
> 269   options, ))
> 270 
> /home/wesm/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:8306)()
>  83 raise ArrowKeyError(message)
>  84 elif status.IsNotImplemented():
> ---> 85 raise ArrowNotImplementedError(message)
>  86 elif status.IsTypeError():
>  87 raise ArrowTypeError(message)
> ArrowNotImplementedError: 
> /home/wesm/code/arrow/cpp/src/arrow/compute/cast.cc:920 code: 
> GetCastFunction(*array.type(), out_type, options, )
> No cast implemented from date32[day] to int32
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1773) [C++] Add casts from date/time types to compatible signed integers

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257833#comment-16257833
 ] 

ASF GitHub Bot commented on ARROW-1773:
---

wesm commented on issue #1310: ARROW-1773: [C++] Add casts from date/time types 
to compatible signed integers
URL: https://github.com/apache/arrow/pull/1310#issuecomment-345405830
 
 
   Casts from timestamp to int64 aren't implemented, so let me add these


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casts from date/time types to compatible signed integers
> --
>
> Key: ARROW-1773
> URL: https://issues.apache.org/jira/browse/ARROW-1773
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
>
> e.g.
> {code}
> In [3]: arr = pa.array([1,2,3], type='i4')
> In [4]: arr.cast('date32')
> Out[4]: 
> 
> [
>   datetime.date(1970, 1, 2),
>   datetime.date(1970, 1, 3),
>   datetime.date(1970, 1, 4)
> ]
> In [5]: arr.cast('date32').cast('i4')
> ---
> ArrowNotImplementedError  Traceback (most recent call last)
>  in ()
> > 1 arr.cast('date32').cast('i4')
> /home/wesm/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.cast 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:28923)()
> 266 
> 267 with nogil:
> --> 268 check_status(Cast(_context(), self.ap[0], type.sp_type,
> 269   options, ))
> 270 
> /home/wesm/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status 
> (/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:8306)()
>  83 raise ArrowKeyError(message)
>  84 elif status.IsNotImplemented():
> ---> 85 raise ArrowNotImplementedError(message)
>  86 elif status.IsTypeError():
>  87 raise ArrowTypeError(message)
> ArrowNotImplementedError: 
> /home/wesm/code/arrow/cpp/src/arrow/compute/cast.cc:920 code: 
> GetCastFunction(*array.type(), out_type, options, )
> No cast implemented from date32[day] to int32
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1791:
---

Assignee: Wes McKinney

> Integration tests generate date[DAY] values outside of reasonable range
> ---
>
> Key: ARROW-1791
> URL: https://issues.apache.org/jira/browse/ARROW-1791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>
> The integration tests are generating random int32 values, but for systems 
> that use millisecond-based date objects (like JavaScript), converting to 
> millisecond date will cause an overflow in a lot of cases. We should generate 
> values that are within a reasonable year range so that overflows when 
> converting to milliseconds do not occur



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1805) [Python] ignore non-parquet files when exploring dataset

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1805:
---

Assignee: Manuel Valdés

> [Python] ignore non-parquet files when exploring dataset
> 
>
> Key: ARROW-1805
> URL: https://issues.apache.org/jira/browse/ARROW-1805
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Manuel Valdés
>Assignee: Manuel Valdés
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When exploring a ParquetDataset, some files 
> (_metadata,_common_metadata,_SUCCESS) should be ignored when determining if a 
> directory follows a valid structure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1280) [C++] Implement Fixed Size List type

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1280:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [C++] Implement Fixed Size List type
> 
>
> Key: ARROW-1280
> URL: https://issues.apache.org/jira/browse/ARROW-1280
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1559.
-
Resolution: Fixed

Issue resolved by pull request 1266
[https://github.com/apache/arrow/pull/1266]

> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257728#comment-16257728
 ] 

ASF GitHub Bot commented on ARROW-1559:
---

wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor 
DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-345390578
 
 
   Phew, looks like we are finally in the clear. +1, I will merge when the 
build passes and then get busy with follow ups


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257649#comment-16257649
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345378644
 
 
   This doesn't fail for me locally:
   
   ```
   $ ../cpp/build/debug/json-integration-test --integration 
--json=/tmp/tmp0jga4tt5/generated_primitive.json --arrow=foo.arrow 
--mode=JSON_TO_ARROW
   Found schema: bool_nullable: bool
   bool_nonnullable: bool not null
   int8_nullable: int8
   int8_nonnullable: int8 not null
   int16_nullable: int16
   int16_nonnullable: int16 not null
   int32_nullable: int32
   int32_nonnullable: int32 not null
   int64_nullable: int64
   int64_nonnullable: int64 not null
   uint8_nullable: uint8
   uint8_nonnullable: uint8 not null
   uint16_nullable: uint16
   uint16_nonnullable: uint16 not null
   uint32_nullable: uint32
   uint32_nonnullable: uint32 not null
   uint64_nullable: uint64
   uint64_nonnullable: uint64 not null
   float32_nullable: float
   float32_nonnullable: float not null
   float64_nullable: double
   float64_nonnullable: double not null
   binary_nullable: binary
   binary_nonnullable: binary not null
   utf8_nullable: string
   utf8_nonnullable: string not null
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257644#comment-16257644
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345377333
 
 
   Sorry that I missed the thing. I will figure out what's going on here:
   
   > Then I had to manually edit the "primitive.json" file to remove the 
"binary_nullable" and "binary_nonnullable" columns, because the C++ command 
fails if they're present (click to expand)
   
   ```
   $ ../cpp/build/release/json-integration-test \
 --integration --mode=JSON_TO_ARROW \
 --json=./test/arrows/json/primitive.json \
 --arrow=./test/arrows/cpp/file/primitive.arrow
   Found schema: bool_nullable: bool
   bool_nonnullable: bool not null
   int8_nullable: int8
   int8_nonnullable: int8 not null
   int16_nullable: int16
   int16_nonnullable: int16 not null
   int32_nullable: int32
   int32_nonnullable: int32 not null
   int64_nullable: int64
   int64_nonnullable: int64 not null
   uint8_nullable: uint8
   uint8_nonnullable: uint8 not null
   uint16_nullable: uint16
   uint16_nonnullable: uint16 not null
   uint32_nullable: uint32
   uint32_nonnullable: uint32 not null
   uint64_nullable: uint64
   uint64_nonnullable: uint64 not null
   float32_nullable: float
   float32_nonnullable: float not null
   float64_nullable: double
   float64_nonnullable: double not null
   binary_nullable: binary
   binary_nonnullable: binary not null
   utf8_nullable: string
   utf8_nonnullable: string not null
   Error message: Invalid: Encountered non-hex digit
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257614#comment-16257614
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

wesm commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151794812
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257609#comment-16257609
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

icexelloss commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151794495
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257587#comment-16257587
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345368042
 
 
   Sorry I have been dragging my feet because I’m not really on board with 
checking in data files that can be generated as part of CI. Per Slack 
conversation it seems there are some roadblocks so I’m available as needed 
today and tomorrow to get this sorted out


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257568#comment-16257568
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345364755
 
 
   @wesm I understand you may be busy, so do you mind if I go ahead and merge 
this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1454) [Python] More informative error message when attempting to write an unsupported Arrow type to Parquet format

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1454:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [Python] More informative error message when attempting to write an 
> unsupported Arrow type to Parquet format
> 
>
> Key: ARROW-1454
> URL: https://issues.apache.org/jira/browse/ARROW-1454
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>
> See https://github.com/pandas-dev/pandas/issues/17102#issuecomment-326746184



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257374#comment-16257374
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

wesm commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151758440
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-17 Thread Li Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257279#comment-16257279
 ] 

Li Jin commented on ARROW-1816:
---

I will look at cell level a bit closer, my hunch is branching can be avoided by 
storing a timeUnit object in the vector:
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/NullableTimeStampNanoVector.java#L116

timeZone doesn't seem to be used in cell level either, but I need to look 
closer.

> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257257#comment-16257257
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

pcmoritz commented on issue #1327: ARROW-1795: [Plasma] Create flag to make 
Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345306499
 
 
   Hey, this following instruction will make sure that the big mmap memory file 
will be reused and no new one will be created: 
https://github.com/apache/arrow/blob/cacbacd439919742a0b6fbec27ee73b5af29347f/cpp/src/plasma/store.cc#L825
   If dlmemalign sees that there is not enough space and the mmap file is 
already at system_memory, it will return a null pointer and then we call the 
eviction policy to evict something (see the loop around dlmemalign).
   
   If hypothetically dlfree was called on the last object in the store, the 
mmap file might be unmapped and remapped when a new object is dlmemaligned; 
this would only happen in corner cases (i.e. if there is only one or very few 
large objects that are already released and they need to be evicted). We could 
prevent it by mallocing a very small object and the beginning and not freeing 
it. If you are concerned about this and can show it actually happens and want 
to submit a PR please go ahead (in this case it makes sense to increase the 
system_memory by the size of the small object), but it is not even clear that 
dlmalloc would unmap the last memory mapped file. And even if it happens, the 
behavior would still be correct...
   
   Hope that helps!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257255#comment-16257255
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

pcmoritz commented on issue #1327: ARROW-1795: [Plasma] Create flag to make 
Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345306499
 
 
   Hey, this following instruction will make sure that the big mmap memory file 
will be reused and no new one will be created: 
https://github.com/apache/arrow/blob/cacbacd439919742a0b6fbec27ee73b5af29347f/cpp/src/plasma/store.cc#L825
   If dlmemalign sees that there is not enough space and the mmap file is 
already at system_memory, it will return a null pointer and then we call the 
eviction policy to evict something (see the loop around dlmemalign).
   
   If hypothetically dlfree was called on the last object in the store, the 
mmap file might be unmapped and remapped when a new object is dlmemaligned; 
this would only happen in corner cases (i.e. if there is only one or very few 
large objects that are already released and they need to be evicted). We could 
prevent it by mallocing a very small object and the beginning and not freeing 
it. If you are concerned about this and can show it actually happens and want 
to submit a PR please go ahead, but it is not even clear that dlmalloc would 
unmap the last memory mapped file.
   
   Hope that helps!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257250#comment-16257250
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

pcmoritz commented on issue #1327: ARROW-1795: [Plasma] Create flag to make 
Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345306499
 
 
   Hey, this followin instruction will make sure that the big mmap memory file 
will be reused and no new one will be created: 
https://github.com/apache/arrow/blob/cacbacd439919742a0b6fbec27ee73b5af29347f/cpp/src/plasma/store.cc#L825
   If dlmemalign sees that there is not enough space and the mmap file is 
already at system_memory, it will return a null pointer and then we call the 
eviction policy to evict something (see the loop around dlmemalign).
   
   If hypothetically dlfree was called on the last object in the store, the 
mmap file might be unmapped and remapped when a new object is dlmemaligned; 
this would only happen in corner cases (i.e. if there is only one or very few 
large objects that are already released and they need to be evicted) and even 
if it happened it would be ok (it would only come at the cost of a little bit 
of latency).
   
   Hope that helps!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257252#comment-16257252
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

toddlipcon commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151741115
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1795) [Plasma C++] change evict policy

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257251#comment-16257251
 ] 

ASF GitHub Bot commented on ARROW-1795:
---

pcmoritz commented on issue #1327: ARROW-1795: [Plasma] Create flag to make 
Plasma store use a single memory-mapped file.
URL: https://github.com/apache/arrow/pull/1327#issuecomment-345306499
 
 
   Hey, this following instruction will make sure that the big mmap memory file 
will be reused and no new one will be created: 
https://github.com/apache/arrow/blob/cacbacd439919742a0b6fbec27ee73b5af29347f/cpp/src/plasma/store.cc#L825
   If dlmemalign sees that there is not enough space and the mmap file is 
already at system_memory, it will return a null pointer and then we call the 
eviction policy to evict something (see the loop around dlmemalign).
   
   If hypothetically dlfree was called on the last object in the store, the 
mmap file might be unmapped and remapped when a new object is dlmemaligned; 
this would only happen in corner cases (i.e. if there is only one or very few 
large objects that are already released and they need to be evicted) and even 
if it happened it would be ok (it would only come at the cost of a little bit 
of latency).
   
   Hope that helps!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma C++] change evict policy
> 
>
> Key: ARROW-1795
> URL: https://issues.apache.org/jira/browse/ARROW-1795
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Lu Qi 
>Assignee: Robert Nishihara
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> case 1.say, we have total free memory 8 G , we have input 5G data, then comes 
> another 6G data, 
> if we choose to evict space 6G , it will throw exception saying that
> no object can be free. This is because we didn't count the 3G remaining free
> space .If we count this remaining 3G , we need to ask only 3G,thus
> we can evict the 5G data and we are still alive . 
> case 2. another situation is :  if we have free memory 10G , we input 1.5G 
> data ,then comes another
> 9G data , if we use  10*20% = 2G data to evict ,then we will crash . In this 
> situation we need to 
> use 9+1.5-10 = 0.5G data to evict  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1345) [Python] Conversion from nested NumPy arrays fails on integers other than int64, float32

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1345:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [Python] Conversion from nested NumPy arrays fails on integers other than 
> int64, float32
> 
>
> Key: ARROW-1345
> URL: https://issues.apache.org/jira/browse/ARROW-1345
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>
> The inferred types are the largest ones, and then later conversion fails on 
> any arrays with smaller types because only exact conversions are implemented 
> thus far



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257183#comment-16257183
 ] 

Wes McKinney commented on ARROW-1769:
-

Is there something actionable we could do in pyarrow?

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1682) [Python] Add documentation / example for reading a directory of Parquet files on S3

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257184#comment-16257184
 ] 

Wes McKinney commented on ARROW-1682:
-

Documentation patches would be welcome

> [Python] Add documentation / example for reading a directory of Parquet files 
> on S3
> ---
>
> Key: ARROW-1682
> URL: https://issues.apache.org/jira/browse/ARROW-1682
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>
> Opened based on comment 
> https://github.com/apache/arrow/pull/916#issuecomment-337563492



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1682) [Python] Add documentation / example for reading a directory of Parquet files on S3

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1682:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [Python] Add documentation / example for reading a directory of Parquet files 
> on S3
> ---
>
> Key: ARROW-1682
> URL: https://issues.apache.org/jira/browse/ARROW-1682
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>
> Opened based on comment 
> https://github.com/apache/arrow/pull/916#issuecomment-337563492



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1816) [Java] Resolve new vector classes structure for timestamp, date and maybe interval

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257181#comment-16257181
 ] 

Wes McKinney commented on ARROW-1816:
-

I think the issue here is to avoid branching at the array cell for Java. If 
there is a switch branch at the hot path for accessing values, then the JIT may 
generate worse code

> [Java] Resolve new vector classes structure for timestamp, date and maybe 
> interval
> --
>
> Key: ARROW-1816
> URL: https://issues.apache.org/jira/browse/ARROW-1816
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> Personally I think having 8 vector classes for timestamps is not great. This 
> is discussed at some point during the PR:
> https://github.com/apache/arrow/pull/1203#discussion_r145241388



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1470) [C++] Add BufferAllocator abstract interface

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1470:

Fix Version/s: (was: 0.8.0)
   0.9.0

> [C++] Add BufferAllocator abstract interface
> 
>
> Key: ARROW-1470
> URL: https://issues.apache.org/jira/browse/ARROW-1470
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.9.0
>
>
> There are some situations ({{arrow::ipc::SerializeRecordBatch}} where we pass 
> a {{MemoryPool*}} solely to call {{AllocateBuffer}} using it. This is not as 
> flexible as it could be, since there are situation where we may wish to 
> allocate from shared memory instead. 
> So instead:
> {code}
> Func(..., BufferAllocator* allocator, ...) {
>   ...
>   std::shared_ptr buffer;
>   RETURN_NOT_OK(allocator->Allocate(nbytes, ));
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257137#comment-16257137
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

icexelloss commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151716263
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257138#comment-16257138
 ] 

ASF GitHub Bot commented on ARROW-1559:
---

wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor 
DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-345279306
 
 
   Lost access to the Windows machine I had been using for local debugging, 
will have to set something up today to figure out why this isn't linking


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> --
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>  Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1820) [C++] Create arrow_compute shared library subcomponent

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257115#comment-16257115
 ] 

Wes McKinney commented on ARROW-1820:
-

Sounds good to me

> [C++] Create arrow_compute shared library subcomponent
> --
>
> Key: ARROW-1820
> URL: https://issues.apache.org/jira/browse/ARROW-1820
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
> Fix For: 0.8.0
>
>
> I think it would be good to do this before we get in too deep. 
> {{libarrow_python}} will need to link to this library



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (ARROW-1820) [C++] Create arrow_compute shared library subcomponent

2017-11-17 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1820.
---

> [C++] Create arrow_compute shared library subcomponent
> --
>
> Key: ARROW-1820
> URL: https://issues.apache.org/jira/browse/ARROW-1820
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
> Fix For: 0.8.0
>
>
> I think it would be good to do this before we get in too deep. 
> {{libarrow_python}} will need to link to this library



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257091#comment-16257091
 ] 

ASF GitHub Bot commented on ARROW-1827:
---

wesm commented on a change in pull request #1326: ARROW-1827: [Java] Add 
checkstyle file and license template
URL: https://github.com/apache/arrow/pull/1326#discussion_r151705745
 
 

 ##
 File path: java/checkstyle/checkstyle.xml
 ##
 @@ -0,0 +1,238 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_3.dtd;>
+
+
> Key: ARROW-1827
> URL: https://issues.apache.org/jira/browse/ARROW-1827
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Assignee: Li Jin
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1820) [C++] Create arrow_compute shared library subcomponent

2017-11-17 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256734#comment-16256734
 ] 

Uwe L. Korn commented on ARROW-1820:


I would prefer the {{-DARROW_COMPUTE=off}} for now. This makes packaging a bit 
simpler.

> [C++] Create arrow_compute shared library subcomponent
> --
>
> Key: ARROW-1820
> URL: https://issues.apache.org/jira/browse/ARROW-1820
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
> Fix For: 0.8.0
>
>
> I think it would be good to do this before we get in too deep. 
> {{libarrow_python}} will need to link to this library



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)