[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-10-02 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r142077712
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   Uhm, we probably don't want to tie the supported formats to the db so maybe 
it's better to just have a varchar in the db and limit the values from a select 
built from a config key so each user can support whatever formats they want.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141714654
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   In my opinion we should keep the pandas data source only for the simple 
cases: excel, csv, html and json. For more complex scenarios i think an apache 
arrow backend as suggested by @mistercrunch is the way to go.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141667380
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   If html is a format supported, wouldn't parse_html be used then?
   
https://github.com/apache/incubator-superset/pull/3492/files#diff-e0528883c8ff7ba985be0482adcc33f5R18
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141667380
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   If html is a format supported, wouldn't parse_html used then?
   
https://github.com/apache/incubator-superset/pull/3492/files#diff-e0528883c8ff7ba985be0482adcc33f5R18
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141667380
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   If html is a format supported, wouldn't parse_html used then?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141656739
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   The original comment was because you added lxml twice not because you added 
two different libraries :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141649750
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   actually i can't find where you are using lxml and beautifulsoup
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141648151
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   once should be enough
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-28 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r141648151
 
 

 ##
 File path: dev-reqs.txt
 ##
 @@ -1,7 +1,10 @@
+beautifulsoup4==4.6.0
+lxml==3.8.0
 codeclimate-test-reporter
 coveralls
 flake8
 flask_cors
+lxml==3.8.0
 
 Review comment:
   one should be enough
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139923359
 
 

 ##
 File path: contrib/connectors/pandas/views.py
 ##
 @@ -0,0 +1,270 @@
+"""Views used by the SqlAlchemy connector"""
+import logging
+
+from past.builtins import basestring
+
+from flask import Markup, flash, redirect
+from flask_appbuilder import CompactCRUDMixin, expose
+from flask_appbuilder.models.sqla.interface import SQLAInterface
+import sqlalchemy as sa
+
+from flask_babel import lazy_gettext as _
+from flask_babel import gettext as __
+
+from superset import appbuilder, db, utils, security, sm
+from superset.utils import has_access
+from superset.connectors.base.views import DatasourceModelView
+from superset.views.base import (
+SupersetModelView, ListWidgetWithCheckboxes, DeleteMixin, DatasourceFilter,
+get_datasource_exist_error_mgs,
+)
+
+from . import models
+
+
+class PandasColumnInlineView(CompactCRUDMixin, SupersetModelView):  # noqa
+datamodel = SQLAInterface(models.PandasColumn)
+
+list_title = _('List Columns')
+show_title = _('Show Column')
+add_title = _('Add Column')
+edit_title = _('Edit Column')
+
+can_delete = False
+list_widget = ListWidgetWithCheckboxes
+edit_columns = [
+'column_name', 'verbose_name', 'description',
+'type', 'groupby', 'filterable',
+'datasource', 'count_distinct', 'sum', 'min', 'max']
+add_columns = edit_columns
+list_columns = [
+'column_name', 'verbose_name', 'type', 'groupby', 'filterable',
+'count_distinct', 'sum', 'min', 'max']
+page_size = 500
+description_columns = {
+'is_dttm': _(
+"Whether to make this column available as a "
+"[Time Granularity] option, column has to be DATETIME or "
+"DATETIME-like"),
+'filterable': _(
+"Whether this column is exposed in the `Filters` section "
+"of the explore view."),
+'type': _(
+"The data type that was inferred by Pandas. "
+"It may be necessary to input a type manually for "
+"expression-defined columns in some cases. In most case "
+"users should not need to alter this."),
+}
+label_columns = {
+'column_name': _("Column"),
+'verbose_name': _("Verbose Name"),
+'description': _("Description"),
+'groupby': _("Groupable"),
+'filterable': _("Filterable"),
+'datasource': _("Datasource"),
+'count_distinct': _("Count Distinct"),
+'sum': _("Sum"),
+'min': _("Min"),
+'max': _("Max"),
+'type': _('Type'),
+}
+
+
+appbuilder.add_view_no_menu(PandasColumnInlineView)
+
+
+class PandasMetricInlineView(CompactCRUDMixin, SupersetModelView):  # noqa
+datamodel = SQLAInterface(models.PandasMetric)
+
+list_title = _('List Metrics')
+show_title = _('Show Metric')
+add_title = _('Add Metric')
+edit_title = _('Edit Metric')
+
+list_columns = ['metric_name', 'verbose_name', 'metric_type']
+edit_columns = [
+'metric_name', 'description', 'verbose_name', 'metric_type',
+'source', 'expression', 'datasource', 'd3format', 'is_restricted']
+description_columns = {
+'source': utils.markdown(
+"a comma-separated list of column(s) used to calculate "
+" the metric. Example: `claim_amount`", True),
+'expression': utils.markdown(
+"a valid Pandas expression as supported by the underlying "
+"backend. Example: `count()`", True),
+'is_restricted': _("Whether the access to this metric is restricted "
+   "to certain roles. Only roles with the permission "
+   "'metric access on XXX (the name of this metric)' "
+   "are allowed to access this metric"),
+'d3format': utils.markdown(
+"d3 formatting string as defined [here]"
+"(https://github.com/d3/d3-format/blob/master/README.md#format). "
+"For instance, this default formatting applies in the Table "
+"visualization and allow for different metric to use different "
+"formats", True
+),
+}
+add_columns = edit_columns
+page_size = 500
+label_columns = {
+'metric_name': _("Metric"),
+'description': _("Description"),
+'verbose_name': _("Verbose Name"),
+'metric_type': _("Type"),
+'source': _("Pandas Source Columns"),
+'expression': _("Pandas Expression"),
+'datasource': _("Datasource"),
+'d3format': _("D3 Format"),
+'is_restricted': _('Is Restricted')
+}
+
+def post_add(self, metric):
+if metric.is_restricted:
+security.merge_perm(sm, 'metric_access', metric.get_perm())
+
+def post_update(self, metric):
+if 

[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139912457
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
+
+cache_timeout = None
+
+def __init__(self, database_name, cache_timeout):
+self.database_name = database_name
+self.cache_timeout = cache_timeout
+
+def __str__(self):
+return self.database_name
+
+
+class PandasColumn(Model, BaseColumn):
+"""
+ORM object for Pandas columns.
+
+Each Pandas Datasource can have multiple columns"""
+
+__tablename__ = 'pandascolumns'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('columns', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+
+@property
+def is_num(self):
+return self.type and is_numeric_dtype(self.type)
+
+@property
+def is_time(self):
+return self.type and is_datetime64_any_dtype(self.type)
+
+@property
+def is_dttm(self):
+return self.is_time
+
+@property
+def is_string(self):
+return self.type and is_string_dtype(self.type)
+
+num_types = (
+'DOUBLE', 'FLOAT', 'INT', 'BIGINT',
+'LONG', 'REAL', 'NUMERIC', 'DECIMAL'
+)
+date_types = ('DATE', 'TIME', 'DATETIME')
+str_types = ('VARCHAR', 'STRING', 'CHAR')
+
+@property
+def expression(self):
+return ''
+
+@property
+def data(self):
+attrs = (
+'column_name', 'verbose_name', 'description', 'expression',
+'filterable', 'groupby')
+return {s: getattr(self, s) for s in attrs}
+
+
+class PandasMetric(Model, BaseMetric):
+"""
+ORM object for Pandas metrics.
+
+Each Pandas Datasource can have multiple metrics
+"""
+
+__tablename__ = 'pandasmetrics'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('metrics', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+source = Column(Text)
+expression = Column(Text)
+
+@property
+def perm(self):
+if self.datasource:
+return ('{parent_name}.[{obj.metric_name}]'
+'(id:{obj.id})').format(
+obj=self,
+parent_name=self.datasource.full_name)
+return None
+
+
+class PandasDatasource(Model, BaseDatasource):
+"""A datasource based on a Pandas DataFrame"""
+
+FORMATS = [
+('csv', 'CSV'),
+('html', 'HTML')
+]
+
+# See 
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases # 
NOQA
+GRAINS = OrderedDict([
+('5 seconds', '5S'),
+('30 seconds', '30S'),
+('1 minute', 'T'),
+('5 minutes', '5T'),
+('1 hour', 'H'),
+('6 hour', '6H'),
+('day', 'D'),
+('one day', 'D'),
+('1 day', 'D'),
+('7 days', '7D'),
+('week', 'W-MON'),
+('week_starting_sunday', 'W-SUN'),
+('week_ending_saturday', 'W-SUN'),
+('month', 'M'),
+('quarter', 'Q'),
+('year', 'A'),
+])
+
+__tablename__ = 'pandasdatasources'
+type = 'pandas'
+baselink = 'pandasdatasourcemodelview'  # url portion pointing to 
ModelView endpoint
+column_class = PandasColumn
+metric_class = PandasMetric
+
+name = Column(String(100), nullable=False)
+source_url = Column(String(1000), nullable=False)
+format = Column(String(20), nullable=False)
+additional_parameters = Column(JSONType)
+
+user_id = Column(Integer, ForeignKey('ab_user.id'))
+owner = relationship(
+sm.user_model,
+

[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139913251
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
 
 Review comment:
   i don't think you need to set defaults if they are not optional in __init__
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139914425
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
+
+cache_timeout = None
+
+def __init__(self, database_name, cache_timeout):
+self.database_name = database_name
+self.cache_timeout = cache_timeout
+
+def __str__(self):
+return self.database_name
+
+
+class PandasColumn(Model, BaseColumn):
+"""
+ORM object for Pandas columns.
+
+Each Pandas Datasource can have multiple columns"""
+
+__tablename__ = 'pandascolumns'
 
 Review comment:
   We may want to be consistent with sqla backend and call it  'pandas_columns'
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139916372
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
+
+cache_timeout = None
+
+def __init__(self, database_name, cache_timeout):
+self.database_name = database_name
+self.cache_timeout = cache_timeout
+
+def __str__(self):
+return self.database_name
+
+
+class PandasColumn(Model, BaseColumn):
+"""
+ORM object for Pandas columns.
+
+Each Pandas Datasource can have multiple columns"""
+
+__tablename__ = 'pandascolumns'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('columns', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+
+@property
+def is_num(self):
+return self.type and is_numeric_dtype(self.type)
+
+@property
+def is_time(self):
+return self.type and is_datetime64_any_dtype(self.type)
+
+@property
+def is_dttm(self):
+return self.is_time
+
+@property
+def is_string(self):
+return self.type and is_string_dtype(self.type)
+
+num_types = (
+'DOUBLE', 'FLOAT', 'INT', 'BIGINT',
+'LONG', 'REAL', 'NUMERIC', 'DECIMAL'
+)
+date_types = ('DATE', 'TIME', 'DATETIME')
+str_types = ('VARCHAR', 'STRING', 'CHAR')
+
+@property
+def expression(self):
+return ''
+
+@property
+def data(self):
+attrs = (
+'column_name', 'verbose_name', 'description', 'expression',
+'filterable', 'groupby')
+return {s: getattr(self, s) for s in attrs}
+
+
+class PandasMetric(Model, BaseMetric):
+"""
+ORM object for Pandas metrics.
+
+Each Pandas Datasource can have multiple metrics
+"""
+
+__tablename__ = 'pandasmetrics'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('metrics', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+source = Column(Text)
+expression = Column(Text)
+
+@property
+def perm(self):
+if self.datasource:
+return ('{parent_name}.[{obj.metric_name}]'
+'(id:{obj.id})').format(
+obj=self,
+parent_name=self.datasource.full_name)
+return None
+
+
+class PandasDatasource(Model, BaseDatasource):
+"""A datasource based on a Pandas DataFrame"""
+
+FORMATS = [
+('csv', 'CSV'),
+('html', 'HTML')
+]
+
+# See 
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases # 
NOQA
+GRAINS = OrderedDict([
+('5 seconds', '5S'),
+('30 seconds', '30S'),
+('1 minute', 'T'),
+('5 minutes', '5T'),
+('1 hour', 'H'),
+('6 hour', '6H'),
+('day', 'D'),
+('one day', 'D'),
+('1 day', 'D'),
+('7 days', '7D'),
+('week', 'W-MON'),
+('week_starting_sunday', 'W-SUN'),
+('week_ending_saturday', 'W-SUN'),
+('month', 'M'),
+('quarter', 'Q'),
+('year', 'A'),
+])
+
+__tablename__ = 'pandasdatasources'
+type = 'pandas'
+baselink = 'pandasdatasourcemodelview'  # url portion pointing to 
ModelView endpoint
+column_class = PandasColumn
+metric_class = PandasMetric
+
+name = Column(String(100), nullable=False)
+source_url = Column(String(1000), nullable=False)
+format = Column(String(20), nullable=False)
+additional_parameters = Column(JSONType)
+
+user_id = Column(Integer, ForeignKey('ab_user.id'))
+owner = relationship(
+sm.user_model,
+

[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139916514
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
+
+cache_timeout = None
+
+def __init__(self, database_name, cache_timeout):
+self.database_name = database_name
+self.cache_timeout = cache_timeout
+
+def __str__(self):
+return self.database_name
+
+
+class PandasColumn(Model, BaseColumn):
+"""
+ORM object for Pandas columns.
+
+Each Pandas Datasource can have multiple columns"""
+
+__tablename__ = 'pandascolumns'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('columns', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+
+@property
+def is_num(self):
+return self.type and is_numeric_dtype(self.type)
+
+@property
+def is_time(self):
+return self.type and is_datetime64_any_dtype(self.type)
+
+@property
+def is_dttm(self):
+return self.is_time
+
+@property
+def is_string(self):
+return self.type and is_string_dtype(self.type)
+
+num_types = (
+'DOUBLE', 'FLOAT', 'INT', 'BIGINT',
+'LONG', 'REAL', 'NUMERIC', 'DECIMAL'
+)
+date_types = ('DATE', 'TIME', 'DATETIME')
+str_types = ('VARCHAR', 'STRING', 'CHAR')
+
+@property
+def expression(self):
+return ''
+
+@property
+def data(self):
+attrs = (
+'column_name', 'verbose_name', 'description', 'expression',
+'filterable', 'groupby')
+return {s: getattr(self, s) for s in attrs}
+
+
+class PandasMetric(Model, BaseMetric):
+"""
+ORM object for Pandas metrics.
+
+Each Pandas Datasource can have multiple metrics
+"""
+
+__tablename__ = 'pandasmetrics'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('metrics', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+source = Column(Text)
+expression = Column(Text)
+
+@property
+def perm(self):
+if self.datasource:
+return ('{parent_name}.[{obj.metric_name}]'
+'(id:{obj.id})').format(
+obj=self,
+parent_name=self.datasource.full_name)
+return None
+
+
+class PandasDatasource(Model, BaseDatasource):
+"""A datasource based on a Pandas DataFrame"""
+
+FORMATS = [
+('csv', 'CSV'),
+('html', 'HTML')
+]
+
+# See 
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases # 
NOQA
+GRAINS = OrderedDict([
+('5 seconds', '5S'),
+('30 seconds', '30S'),
+('1 minute', 'T'),
+('5 minutes', '5T'),
+('1 hour', 'H'),
+('6 hour', '6H'),
+('day', 'D'),
+('one day', 'D'),
+('1 day', 'D'),
+('7 days', '7D'),
+('week', 'W-MON'),
+('week_starting_sunday', 'W-SUN'),
+('week_ending_saturday', 'W-SUN'),
+('month', 'M'),
+('quarter', 'Q'),
+('year', 'A'),
+])
+
+__tablename__ = 'pandasdatasources'
+type = 'pandas'
+baselink = 'pandasdatasourcemodelview'  # url portion pointing to 
ModelView endpoint
+column_class = PandasColumn
+metric_class = PandasMetric
+
+name = Column(String(100), nullable=False)
+source_url = Column(String(1000), nullable=False)
+format = Column(String(20), nullable=False)
+additional_parameters = Column(JSONType)
+
+user_id = Column(Integer, ForeignKey('ab_user.id'))
+owner = relationship(
+sm.user_model,
+

[GitHub] xrmx commented on a change in pull request #3492: PandasConnector

2017-09-20 Thread git
xrmx commented on a change in pull request #3492: PandasConnector
URL: 
https://github.com/apache/incubator-superset/pull/3492#discussion_r139915471
 
 

 ##
 File path: contrib/connectors/pandas/models.py
 ##
 @@ -0,0 +1,724 @@
+from collections import OrderedDict
+from datetime import datetime
+import logging
+from past.builtins import basestring
+try:
+from urllib.parse import urlparse
+except ImportError:
+from urlparse import urlparse
+
+import pandas as pd
+from pandas.api.types import (
+is_string_dtype, is_numeric_dtype, is_datetime64_any_dtype)
+
+from sqlalchemy import (
+Column, Integer, String, ForeignKey, Text
+)
+import sqlalchemy as sa
+from sqlalchemy.orm import backref, relationship
+from sqlalchemy_utils import ChoiceType, JSONType
+
+from flask import escape, Markup
+from flask_appbuilder import Model
+from flask_babel import lazy_gettext as _
+
+from superset import db, utils, sm
+from superset.connectors.base.models import (
+BaseDatasource, BaseColumn, BaseMetric)
+from superset.models.helpers import QueryResult, set_perm
+from superset.utils import QueryStatus
+
+
+class PandasDatabase(object):
+"""Non-ORM object for a Pandas Source"""
+database_name = ''
+
+cache_timeout = None
+
+def __init__(self, database_name, cache_timeout):
+self.database_name = database_name
+self.cache_timeout = cache_timeout
+
+def __str__(self):
+return self.database_name
+
+
+class PandasColumn(Model, BaseColumn):
+"""
+ORM object for Pandas columns.
+
+Each Pandas Datasource can have multiple columns"""
+
+__tablename__ = 'pandascolumns'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('columns', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+
+@property
+def is_num(self):
+return self.type and is_numeric_dtype(self.type)
+
+@property
+def is_time(self):
+return self.type and is_datetime64_any_dtype(self.type)
+
+@property
+def is_dttm(self):
+return self.is_time
+
+@property
+def is_string(self):
+return self.type and is_string_dtype(self.type)
+
+num_types = (
+'DOUBLE', 'FLOAT', 'INT', 'BIGINT',
+'LONG', 'REAL', 'NUMERIC', 'DECIMAL'
+)
+date_types = ('DATE', 'TIME', 'DATETIME')
+str_types = ('VARCHAR', 'STRING', 'CHAR')
+
+@property
+def expression(self):
+return ''
+
+@property
+def data(self):
+attrs = (
+'column_name', 'verbose_name', 'description', 'expression',
+'filterable', 'groupby')
+return {s: getattr(self, s) for s in attrs}
+
+
+class PandasMetric(Model, BaseMetric):
+"""
+ORM object for Pandas metrics.
+
+Each Pandas Datasource can have multiple metrics
+"""
+
+__tablename__ = 'pandasmetrics'
+
+id = Column(Integer, primary_key=True)
+pandasdatasource_id = Column(Integer, ForeignKey('pandasdatasources.id'))
+datasource = relationship(
+'PandasDatasource',
+backref=backref('metrics', cascade='all, delete-orphan'),
+foreign_keys=[pandasdatasource_id])
+source = Column(Text)
+expression = Column(Text)
+
+@property
+def perm(self):
+if self.datasource:
+return ('{parent_name}.[{obj.metric_name}]'
+'(id:{obj.id})').format(
+obj=self,
+parent_name=self.datasource.full_name)
+return None
+
+
+class PandasDatasource(Model, BaseDatasource):
+"""A datasource based on a Pandas DataFrame"""
+
+FORMATS = [
+('csv', 'CSV'),
+('html', 'HTML')
+]
+
+# See 
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases # 
NOQA
+GRAINS = OrderedDict([
+('5 seconds', '5S'),
+('30 seconds', '30S'),
+('1 minute', 'T'),
+('5 minutes', '5T'),
+('1 hour', 'H'),
+('6 hour', '6H'),
+('day', 'D'),
+('one day', 'D'),
+('1 day', 'D'),
+('7 days', '7D'),
+('week', 'W-MON'),
+('week_starting_sunday', 'W-SUN'),
+('week_ending_saturday', 'W-SUN'),
+('month', 'M'),
+('quarter', 'Q'),
+('year', 'A'),
+])
+
+__tablename__ = 'pandasdatasources'
+type = 'pandas'
+baselink = 'pandasdatasourcemodelview'  # url portion pointing to 
ModelView endpoint
+column_class = PandasColumn
+metric_class = PandasMetric
+
+name = Column(String(100), nullable=False)
+source_url = Column(String(1000), nullable=False)
+format = Column(String(20), nullable=False)
+additional_parameters = Column(JSONType)
+
+user_id = Column(Integer, ForeignKey('ab_user.id'))
+owner = relationship(
+sm.user_model,
+