[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336278823
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   The new wording sounds good. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238243
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   Did the other changes to this file make it? Looks like the empty line is 
still there and I don't see the test instructions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238079
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   Sounds a little scary to me. We just want to make it clear that this isn't 
how to use an official release.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336110220
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   It would be good to have the information on how to install the library in a 
section here. In user-facing docs like this, we need to be clear that 
installing from master is for development and testing purposes. We can't 
recommend using code unless it is a released version. That means the wording 
should be something like "Iceberg for Python is not yet released and published 
to PyPI. To try out the python library, you can install it using `pip -e`: ..."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108392
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
+
+```python
+# struct<1 id: int, 2 data: optional string>
+struct = StructType.of([NestedField.required(1, "id", IntegerType.get()),
+NestedField.optional(2, "data", StringType.get()])
+  )
+```
+```python
+# map<1 key: int, 2 value: optional string>
+map_var = MapType.of_optional(1, IntegerType.get(),
+  2, StringType.get())
+```
+```python
+# array<1 element: int>
+list_var = ListType.of_required(1, IntegerType.get());
+```
+
+## Expressions
+Iceberg’s expressions are used to configure table scans. To create 
expressions, use the factory methods in Expressions.
+
+Supported predicate expressions are:
+
++ __is_null__
 
 Review comment:
   Could you use fixed-width here instead of bold?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108513
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
 
 Review comment:
   Using a fixed-width font here for method names would assist readability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108312
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
 
 Review comment:
   For method names, we typically use fixed-width font, like this:
   
   ```
   ... using `NestedField.optional` or `NestedField.required`. Map value ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   This doesn't quite resolve #323 because it doesn't document how to run 
python tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   This doesn't quite resolve #323 because it doesn't document how to run 
python tests. Could you add a section for that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107337
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
+
 
 Review comment:
   Nit: empty line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org